This page is a collection of datasets created as a part of CompMusic. These datasets are useful for various MIR related tasks on the music cultures under study in CompMusic. Please visit the respective pages for more details.
Indian Art Music
Indian Music Tonic Dataset
This dataset comprises 597 commercially available audio music recordings of Indian art music (Hindustani and Carnatic music), each manually annotated with the tonic of the lead artist. This dataset is used as the test corpus for the development of tonic identification approaches. More information about this dataset can be obtained from http://compmusic.upf.edu/iam-tonic-dataset
Carnatic Varnam Dataset
Carnatic varnam dataset is a collection of 28 solo vocal recordings, recorded for our research on intonation analysis of Carnatic ragas. The collection consists of audio recordings, time aligned tala cycle annotations and swara notations in a machine readable format. The dataset can be obtained from http://compmusic.upf.edu/carnatic-varnam-dataset
Carnatic Music Rhythm Dataset
The Carnatic Music Rhythm Dataset is a sub-collection of 176 excerpts (16.6 hours) in four taalas of Carnatic music with audio, associated tala related metadata and time aligned markers indicating the progression through the tala cycles. It is useful as a test corpus for many automatic rhythm analysis tasks in Carnatic music. A subset with 118 two minute long excerpts (about 4 hours) is also available with equivalent content. The dataset can be obtained from http://compmusic.upf.edu/carnatic-rhythm-dataset
Hindustani Music Rhythm Dataset
The Hindustani Music Rhythm Dataset is a sub-collection of 151 (5 hours) in four taals of Hindustani music with audio, associated taal related metadata and time aligned markers indicating the progression through the taal cycles. The dataset is useful as a test corpus for many automatic rhythm analysis tasks in Hindustani music. The dataset can be obtained from http://compmusic.upf.edu/hindustani-rhythm-dataset
Mridangam Stroke Dataset
The Mridangam Stroke dataset is a collection of 7162 audio examples of individual strokes of the Mridangam in various tonics. The dataset comprises of 10 different strokes played on Mridangams with 6 different tonic values. The dataset can be used for training models for each Mridangam stroke. The dataset can be obtained from http://compmusic.upf.edu/mridangam-stroke-dataset
Mridangam Tani-avarthanam Dataset
The Mridangam Tani-avarthanam dataset is a transcribed collection of two tani-avarthanams played by the renowned Mridangam maestro Padmavibhushan Umayalpuram K. Sivaraman. The audio was recorded at IIT Madras, India and annotated by professional Carnatic percussionists. It consists of about 24 min of audio and 8800 strokes. For more details, please see, http://compmusic.upf.edu/mridangam-tani-dataset
Tabla Solo Dataset
The Tabla Solo Dataset is a transcribed collection of Tabla solo audio recordings spanning compositions from six different Gharanas of Tabla, played by Pt. Arvind Mulgaonkar. The dataset consists of audio and time aligned bol transcriptions. For more details, please see: http://compmusic.upf.edu/tabla-solo-dataset
Ottoman-Turkish Makam Music
Turkish Makam Symbolic Phrase Dataset
This study presents a large machine-readable dataset of Turkish makam music scores segmented into phrases by experts of this music. The dataset consists of 31362 phrases on a set of 480 scores of different compositions annotated by 3 experts. http://compmusic.upf.edu/node/237
Turkish şarkı vocal dataset
Turkish şarkı vocal dataset is a collection of 10 recordings of compositions from the vocal form şarkı. The collection has annotations with lyrical lines. Each lyrical phrase is aligned to its corresponding segment in the audio. The dataset can be obtained from http://compmusic.upf.edu/turkish-sarki
Turkish makam acapella sections dataset
The dataset consists of 12 a cappella performances of 11 compositions with total duration of 19 minutes. Solo vocal versions of the originals have been sung by professional singers (originals taken from Turkish şarkı vocal dataset), due to the lack of appropriate a cappella material in this music tradition. A performance has been recorded in sync with the original recording, whereby instrumental sections are left as silence. This assures thatthe order, in which sections are performed, is kept the same
Turkish Makam Audıo-Score Alıgnment Dataset
This release contains 6 audio recordings of 4 peşrev compositions from the classical Ottoman-Turkish tradition. There are 51 sections in the audio recordings in total. The total number of the note annotations in the audio recordings are 3896. These annotations typically follow the note sequence in the symbTr. There are 3 inserted and 49 omitted notes in the annotations with respect to the symbTr-scores.
Turkish Makam Sectıon Dataset
This release contains 2095 sections annotated in 257 audio recordings of 58 compositions. The midi and SymbTr-scores of the compositions are also included in the dataset. For more information please refer to the paper.
Turkish Makam Tonıc Dataset
This release contains annotated tonic frequencies of 257 audio recordings. The SymbTr-scores of the corresponding compositions performed in the audio recordings are also indicated. For more information please refer to the paper.
Turkish Makam Melodıc Phrase Dataset
In this dataset, 899 SymbTr-scores were manually annotated into melodic segments by 3 experts. There are a total of 31362 phrase annotations in this dataset.
Beijing Opera (京剧)
Beijing Opera Percussion Instrument Dataset
Beijing Opera percussion dataset is a collection of 236 examples of isolated strokes spanning the four percussion instrument classes used in Beijing Opera. It can be used to build stroke models for each percussion instrument. The dataset can be obtained from http://compmusic.upf.edu/bo-perc-dataset
Beijing Opera Percussion Pattern Dataset
Beijing Opera Percussion Pattern (BOPP) dataset is a collection of 133 audio percussion patterns covering five pattern classes. The dataset includes the audio and syllable level transcriptions for the patterns (non-time aligned). It is useful for percussion transcription and classification tasks. The patterns have been extracted from audio recordings of arias and labeled by a musicologist. The dataset can be obtained from http://compmusic.upf.edu/bopp-dataset