Datasets

 

This page is a collection of datasets created as a part of CompMusic. These datasets are useful for various MIR related tasks on the music cultures under study in CompMusic. Please visit the respective pages for more details. 

Indian Art Music

Indian Music Tonic Dataset

This dataset comprises 597 commercially available audio music recordings of Indian art music (Hindustani and Carnatic music), each manually annotated with the tonic of the lead artist. This dataset is used as the test corpus for the development of tonic identification approaches. More information about this dataset can be obtained from http://compmusic.upf.edu/iam-tonic-dataset

Carnatic Varnam Dataset

Carnatic varnam dataset is a collection of 28 solo vocal recordings, recorded for our research on intonation analysis of Carnatic ragas. The collection consists of audio recordings, time aligned tala cycle annotations and swara notations in a machine readable format. The dataset can be obtained from http://compmusic.upf.edu/carnatic-varnam-dataset

Carnatic Music Rhythm Dataset

The Carnatic Music Rhythm Dataset is a sub-collection of 176 excerpts (16.6 hours) in four taalas of Carnatic music with audio, associated tala related metadata and time aligned markers indicating the progression through the tala cycles. It is useful as a test corpus for many automatic rhythm analysis tasks in Carnatic music. A subset with 118 two minute long excerpts (about 4 hours) is also available with equivalent content. The dataset can be obtained from http://compmusic.upf.edu/carnatic-rhythm-dataset

Hindustani Music Rhythm Dataset

The Hindustani Music Rhythm Dataset is a sub-collection of 151 (5 hours) in four taals of Hindustani music with audio, associated taal related metadata and time aligned markers indicating the progression through the taal cycles. The dataset is useful as a test corpus for many automatic rhythm analysis tasks in Hindustani music. The dataset can be obtained from http://compmusic.upf.edu/hindustani-rhythm-dataset

Mridangam Stroke Dataset

The Mridangam Stroke dataset is a collection of 7162 audio examples of individual strokes of the Mridangam in various tonics. The dataset comprises of 10 different strokes played on Mridangams with 6 different tonic values. The dataset can be used for training models for each Mridangam stroke. The dataset can be obtained from http://compmusic.upf.edu/mridangam-stroke-dataset

Mridangam Tani-avarthanam Dataset

The Mridangam Tani-avarthanam dataset is a transcribed collection of two tani-avarthanams played by the renowned Mridangam maestro Padmavibhushan Umayalpuram K. Sivaraman. The audio was recorded at IIT Madras, India and annotated by professional Carnatic percussionists. It consists of about 24 min of audio and 8800 strokes. For more details, please see, http://compmusic.upf.edu/mridangam-tani-dataset

Tabla Solo Dataset

The Tabla Solo Dataset is a transcribed collection of Tabla solo audio recordings spanning compositions from six different Gharanas of Tabla, played by Pt. Arvind Mulgaonkar. The dataset consists of audio and time aligned bol transcriptions. For more details, please see: http://compmusic.upf.edu/tabla-solo-dataset


Ottoman-Turkish Makam Music

Turkish Makam Symbolic Phrase Dataset

This study presents a large machine-readable dataset of Turkish makam music scores segmented into phrases by experts of this music. The dataset consists of 31362 phrases on a set of 480 scores of different compositions annotated by 3 experts. http://compmusic.upf.edu/node/237

Turkish şarkı vocal dataset

Turkish şarkı vocal dataset is a collection of 10 recordings of compositions from the vocal form şarkı. The collection has annotations with lyrical lines. Each lyrical phrase is aligned to its corresponding segment in the audio. The dataset can be obtained from http://compmusic.upf.edu/turkish-sarki

Turkish makam acapella sections dataset

The dataset consists of 12 a cappella performances of 11 compositions with total duration of 19 minutes. Solo vocal versions of the originals have been sung by professional singers (originals taken from Turkish şarkı vocal dataset), due to the lack of appropriate a cappella material in this music tradition. A performance has been recorded in sync with the original recording, whereby instrumental sections are left as silence. This assures thatthe order, in which sections are performed, is kept the same

http://compmusic.upf.edu/turkish-makam-acapella-sections-dataset

 

Turkish Makam Audıo-Score Alıgnment Dataset

This release contains 6 audio recordings of 4 peşrev compositions from the classical Ottoman-Turkish tradition. There are 51 sections in the audio recordings in total. The total number of the note annotations in the audio recordings are 3896. These annotations typically follow the note sequence in the symbTr. There are 3 inserted and 49 omitted notes in the annotations with respect to the symbTr-scores.

http://compmusic.upf.edu/node/233

Turkish Makam Sectıon Dataset

This release contains 2095 sections annotated in 257 audio recordings of 58 compositions. The midi and SymbTr-scores of the compositions are also included in the dataset. For more information please refer to the paper.

http://compmusic.upf.edu/node/234

Turkish Makam Tonıc Dataset

This release contains annotated tonic frequencies of 257 audio recordings. The SymbTr-scores of the corresponding compositions performed in the audio recordings are also indicated. For more information please refer to the paper.

http://compmusic.upf.edu/node/235

Turkish Makam Melodıc Phrase Dataset

In this dataset, 899 SymbTr-scores were manually annotated into melodic segments by 3 experts. There are a total of 31362 phrase annotations in this dataset.

http://compmusic.upf.edu/node/236


Beijing Opera (京剧)

Beijing Opera Percussion Instrument Dataset

Beijing Opera percussion dataset is a collection of 236 examples of isolated strokes spanning the four percussion instrument classes used in Beijing Opera. It can be used to build stroke models for each percussion instrument. The dataset can be obtained from http://compmusic.upf.edu/bo-perc-dataset

Beijing Opera Percussion Pattern Dataset

Beijing Opera Percussion Pattern (BOPP) dataset is a collection of 133 audio percussion patterns covering five pattern classes. The dataset includes the audio and syllable level transcriptions for the patterns (non-time aligned). It is useful for percussion transcription and classification tasks. The patterns have been extracted from audio recordings of arias and labeled by a musicologist. The dataset can be obtained from http://compmusic.upf.edu/bopp-dataset