这个页面列举了CompMusic研究中部分实验所使用的测试集合。它们是五种音乐传统研究资料库的补充。

京剧 (Beijing Opera)

京剧锣鼓乐器测试集合

京剧锣鼓乐器测试集合包含有京剧四种锣鼓乐器的共236个剪辑后的采样。它可以用于建立每件锣鼓乐器锣鼓点的模型。http://compmusic.upf.edu/bo-perc-dataset

京剧锣鼓模式测试集合

京剧锣鼓模式测试集合是包含有5种锣鼓模式类型，133个锣鼓模式采样。这个集合包括有音频和其模式的音节转谱（没有时间对齐），被用于锣鼓音频的转谱和分类任务。这些模式是从音频素材中剪切出来，并由一位京剧音乐学家标注。http://compmusic.upf.edu/bopp-dataset

京剧清唱音高曲线测试集合

京剧清唱音高曲线测试集合包含有39个京剧清唱录音音高曲线分段的ground truth。此集合包含有（1）音符转谱和（2）音高曲线分段的ground truth。它被用于音符转谱和音高曲线分段任务中。音高曲线从音频录音中提取，然后由一个京剧研究者修正并分段。https://zenodo.org/record/832736

京剧清唱音频和边界标注测试集合

这个测试集合包含有专业和业余京剧演员的清唱音频和边界标注。边界按照分级的方式进行标注：唱句 (line)，音节 (syllable)，音素 (phoneme)。关于标注的格式、单位和分析(parsing)的代码请参考这个链接https://github.com/MTG/jingjuPhonemeAnnotation。 https://zenodo.org/record/344932

京剧清唱音频扩展测试集合

这个测试集合包含有120个录音、1265个唱句，它是现有的CompMusic研究资料库的一个扩充，例如，京剧清唱测试集合第一部分（https://doi.org/10.5281/zenodo.344932）。我们邀请了专业和业余的京剧演员进行录音，大多数的京剧音乐元素都被囊括其中。这个集合同时包含每个唱段和唱句的元数据，目的是为了进行京剧演唱的自动评价研究。https://doi.org/10.5281/zenodo.842229

京剧曲谱集合

这个集合包含92个京剧曲谱，创建它的目的是为了分析京剧唱腔的音乐系统。我们使用MuseScore将纸质的曲谱手工输入成机器可读格式，并输出成MusicXML。http://compmusic.upf.edu/node/348

京剧唱词集合

为了通过唱词研究京剧板式的表现功能，我们从Jingju Lyrics Collection中创建了一系列数据集。这些集合通过挖掘在线数据库Zhongguo jingju xikao 中国京剧戏考的方式创建，目的是采用自然语言处理的方法分析西皮和二黄中原板、慢板、快板和摇板的唱词 - 主题建模和文献分类。http://compmusic.upf.edu/jingju-lyrics-datasets

京剧唱段标注集合

这个集合包含有34个京剧唱段，Praat软件的分级标注。所选择的唱段包含两个声腔 - 西皮和二黄，5个行当 - 旦、净、老旦、老生和小生。标注的文件格式为Praat TextGrid，标注的层次有：唱段、MusicBrainz ID、艺术家、流派、行当、声腔、板式、唱句唱词、音节和锣鼓经。http://compmusic.upf.edu/node/349

Indian Art Music

Indian Music Tonic Dataset

This dataset comprises 597 commercially available audio music recordings of Indian art music (Hindustani and Carnatic music), each manually annotated with the tonic of the lead artist. This dataset is used as the test corpus for the development of tonic identification approaches. http://compmusic.upf.edu/iam-tonic-dataset

Carnatic Varnam Dataset

Carnatic varnam dataset is a collection of 28 solo vocal recordings, recorded for our research on intonation analysis of Carnatic ragas. The collection consists of audio recordings, time aligned tala cycle annotations and swara notations in a machine readable format. http://compmusic.upf.edu/carnatic-varnam-dataset

Carnatic Music Rhythm Dataset

The Carnatic Music Rhythm Dataset is a sub-collection of 176 excerpts (16.6 hours) in four taalas of Carnatic music with audio, associated tala related metadata and time aligned markers indicating the progression through the tala cycles. It is useful as a test corpus for many automatic rhythm analysis tasks in Carnatic music. A subset with 118 two minute long excerpts (about 4 hours) is also available with equivalent content. http://compmusic.upf.edu/carnatic-rhythm-dataset

Hindustani Music Rhythm Dataset

The Hindustani Music Rhythm Dataset is a sub-collection of 151 (5 hours) in four taals of Hindustani music with audio, associated taal related metadata and time aligned markers indicating the progression through the taal cycles. The dataset is useful as a test corpus for many automatic rhythm analysis tasks in Hindustani music. http://compmusic.upf.edu/hindustani-rhythm-dataset

Mridangam Stroke Dataset

The Mridangam Stroke dataset is a collection of 7162 audio examples of individual strokes of the Mridangam in various tonics. The dataset comprises of 10 different strokes played on Mridangams with 6 different tonic values. The dataset can be used for training models for each Mridangam stroke. http://compmusic.upf.edu/mridangam-stroke-dataset

Mridangam Tani-avarthanam Dataset

The Mridangam Tani-avarthanam dataset is a transcribed collection of two tani-avarthanams played by the renowned Mridangam maestro Padmavibhushan Umayalpuram K. Sivaraman. The audio was recorded at IIT Madras, India and annotated by professional Carnatic percussionists. It consists of about 24 min of audio and 8800 strokes. http://compmusic.upf.edu/mridangam-tani-dataset

Tabla Solo Dataset

The Tabla Solo Dataset is a transcribed collection of Tabla solo audio recordings spanning compositions from six different Gharanas of Tabla, played by Pt. Arvind Mulgaonkar. The dataset consists of audio and time aligned bol transcriptions. http://compmusic.upf.edu/tabla-solo-dataset

Turkish Makam Music

Turkish Makam Symbolic Phrase Dataset

This study presents a large machine-readable dataset of Turkish makam music scores segmented into phrases by experts of this music. The dataset consists of 31362 phrases on a set of 480 scores of different compositions annotated by 3 experts. http://compmusic.upf.edu/node/237

Turkish Makam Melodıc Phrase Dataset

In this dataset, 899 SymbTr-scores were manually annotated into melodic segments by 3 experts. There are a total of 31362 phrase annotations in this dataset. http://compmusic.upf.edu/node/236

Turkish şarki vocal dataset

Turkish şarkı vocal dataset is a collection of 10 recordings of compositions from the vocal form şarkı. The collection has annotations with lyrical lines. Each lyrical phrase is aligned to its corresponding segment in the audio. http://compmusic.upf.edu/turkish-sarki

Turkish makam acapella sections dataset

The dataset consists of 12 a cappella performances of 11 compositions with total duration of 19 minutes. Solo vocal versions of the originals have been sung by professional singers (originals taken from Turkish şarkı vocal dataset), due to the lack of appropriate a cappella material in this music tradition. A performance has been recorded in sync with the original recording, whereby instrumental sections are left as silence. This assures that the order, in which sections are performed, is kept the same. http://compmusic.upf.edu/turkish-makam-acapella-sections-dataset

Turkish Makam Audıo-Score Alıgnment Dataset

This release contains 6 audio recordings of 4 peşrev compositions from the classical Ottoman-Turkish tradition. There are 51 sections in the audio recordings in total. The total number of the note annotations in the audio recordings are 3896. These annotations typically follow the note sequence in the symbTr. There are 3 inserted and 49 omitted notes in the annotations with respect to the symbTr-scores. http://compmusic.upf.edu/node/233

Turkish Makam Sectıon Dataset

This release contains 2095 sections annotated in 257 audio recordings of 58 compositions. The midi and SymbTr-scores of the compositions are also included in the dataset. For more information please refer to the paper. http://compmusic.upf.edu/node/234

Turkish Composition Identification Dataset

The repository contains the machine readable music scores of 147 instrumental compositions selected from the SymbTr collection and 743 audio recordings selected from the CompMusic makam corpus. In the dataset there are 360 recordings associated with 87 music scores, forming 362 relevant audio-score pairs. https://github.com/MTG/otmm_composition_identification_dataset/releases

Turkish Makam Tonıc Dataset

The latest release contains annotated tonic frequencies of more than 2000 audio recordings. If available, the SymbTr-scores of the corresponding compositions performed in the audio recordings are also indicated. For more information please refer to the latest release hosted on GitHub. https://github.com/MTG/otmm_tonic_dataset/releases

Turkish Makam Recognition Dataset

This repository hosts the dataset designed to test makam recognition methodologies on Turkish makam music. It is composed of 50 recording from each of the 20 most common makams in CompMusic Project's Makam Music collection. Currently, the dataset is the largest makam recognition dataset. https://github.com/MTG/otmm_makam_recognition_dataset/releases