京剧 (Beijing Opera)
京剧清唱音高曲线测试集合包含有39个京剧清唱录音音高曲线分段的ground truth。此集合包含有(1)音符转谱和(2)音高曲线分段的ground truth。它被用于音符转谱和音高曲线分段任务中。音高曲线从音频录音中提取,然后由一个京剧研究者修正并分段。
这个测试集合包含有专业和业余京剧演员的清唱音频和边界标注。边界按照分级的方式进行标注: 唱句 (line),音节 (syllable),音素 (phoneme)。关于标注的格式、单位和分析(parsing)的代码请参考这个链接。
为了通过唱词研究京剧板式的表现功能,我们从Jingju Lyrics Collection中创建了一系列数据集。这些集合通过挖掘在线数据库Zhongguo jingju xikao 中国京剧戏考的方式创建,目的是采用自然语言处理的方法分析西皮和二黄中原板、慢板、快板和摇板的唱词 - 主题建模和文献分类。
这个集合包含有34个京剧唱段,Praat软件的分级标注。所选择的唱段包含两个声腔 - 西皮和二黄,5个行当 - 旦、净、老旦、老生和小生。标注的文件格式为Praat TextGrid,标注的层次有:唱段、MusicBrainz ID、艺术家、流派、行当、声腔、板式、唱句唱词、音节和锣鼓经。
Indian Art Music
Indian Music Tonic Dataset
This dataset comprises 597 commercially available audio music recordings of Indian art music (Hindustani and Carnatic music), each manually annotated with the tonic of the lead artist. This dataset is used as the test corpus for the development of tonic identification approaches.
Carnatic Varnam Dataset
Carnatic varnam dataset is a collection of 28 solo vocal recordings, recorded for our research on intonation analysis of Carnatic ragas. The collection consists of audio recordings, time aligned tala cycle annotations and swara notations in a machine readable format.
Carnatic Music Rhythm Dataset
The Carnatic Music Rhythm Dataset is a sub-collection of 176 excerpts (16.6 hours) in four taalas of Carnatic music with audio, associated tala related metadata and time aligned markers indicating the progression through the tala cycles. It is useful as a test corpus for many automatic rhythm analysis tasks in Carnatic music. A subset with 118 two minute long excerpts (about 4 hours) is also available with equivalent content.
Hindustani Music Rhythm Dataset
The Hindustani Music Rhythm Dataset is a sub-collection of 151 (5 hours) in four taals of Hindustani music with audio, associated taal related metadata and time aligned markers indicating the progression through the taal cycles. The dataset is useful as a test corpus for many automatic rhythm analysis tasks in Hindustani music.
Mridangam Stroke Dataset
The Mridangam Stroke dataset is a collection of 7162 audio examples of individual strokes of the Mridangam in various tonics. The dataset comprises of 10 different strokes played on Mridangams with 6 different tonic values. The dataset can be used for training models for each Mridangam stroke.
Mridangam Tani-avarthanam Dataset
The Mridangam Tani-avarthanam dataset is a transcribed collection of two tani-avarthanams played by the renowned Mridangam maestro Padmavibhushan Umayalpuram K. Sivaraman. The audio was recorded at IIT Madras, India and annotated by professional Carnatic percussionists. It consists of about 24 min of audio and 8800 strokes.
Tabla Solo Dataset
The Tabla Solo Dataset is a transcribed collection of Tabla solo audio recordings spanning compositions from six different Gharanas of Tabla, played by Pt. Arvind Mulgaonkar. The dataset consists of audio and time aligned bol transcriptions.
Turkish Makam Music
Turkish Makam Symbolic Phrase Dataset
This study presents a large machine-readable dataset of Turkish makam music scores segmented into phrases by experts of this music. The dataset consists of 31362 phrases on a set of 480 scores of different compositions annotated by 3 experts.
Turkish Makam Melodıc Phrase Dataset
In this dataset, 899 SymbTr-scores were manually annotated into melodic segments by 3 experts. There are a total of 31362 phrase annotations in this dataset.
Turkish şarki vocal dataset
Turkish şarkı vocal dataset is a collection of 10 recordings of compositions from the vocal form şarkı. The collection has annotations with lyrical lines. Each lyrical phrase is aligned to its corresponding segment in the audio.
Turkish makam acapella sections dataset
The dataset consists of 12 a cappella performances of 11 compositions with total duration of 19 minutes. Solo vocal versions of the originals have been sung by professional singers (originals taken from Turkish şarkı vocal dataset), due to the lack of appropriate a cappella material in this music tradition. A performance has been recorded in sync with the original recording, whereby instrumental sections are left as silence. This assures that the order, in which sections are performed, is kept the same.
Turkish Makam Audıo-Score Alıgnment Dataset
This release contains 6 audio recordings of 4 peşrev compositions from the classical Ottoman-Turkish tradition. There are 51 sections in the audio recordings in total. The total number of the note annotations in the audio recordings are 3896. These annotations typically follow the note sequence in the symbTr. There are 3 inserted and 49 omitted notes in the annotations with respect to the symbTr-scores.
Turkish Makam Sectıon Dataset
This release contains 2095 sections annotated in 257 audio recordings of 58 compositions. The midi and SymbTr-scores of the compositions are also included in the dataset. For more information please refer to the paper.
Turkish Composition Identification Dataset
The repository contains the machine readable music scores of 147 instrumental compositions selected from the SymbTr collection and 743 audio recordings selected from the CompMusic makam corpus. In the dataset there are 360 recordings associated with 87 music scores, forming 362 relevant audio-score pairs.
Turkish Makam Tonıc Dataset
The latest release contains annotated tonic frequencies of more than 2000 audio recordings. If available, the SymbTr-scores of the corresponding compositions performed in the audio recordings are also indicated. For more information please refer to the latest release hosted on GitHub.
Turkish Makam Recognition Dataset
This repository hosts the dataset designed to test makam recognition methodologies on Turkish makam music. It is composed of 50 recording from each of the 20 most common makams in CompMusic Project's Makam Music collection. Currently, the dataset is the largest makam recognition dataset.