Carnatic Kriti Dataset

Carnatic Kriti Dataset is a collection of recordings curated from the CompMusic collection with a set criteria on the number of recordings and works per raaga. As the CompMusic collection evolves over time, the dataset is presented here as a snapshot at different time periods. The changes between the snapshots reflect the new releases/recordings being added to the collection, as well as improvments in the data quality. Such improvements include adding missing metadata, correcting mislabelled data, changes in the data schema etc.

The Dataset

Version 2.0

This snapshot is dated June 2016. This is created with a criteria as follows: each raaga must have a minimum number of 20 performances that span over at least 5 different compositions. This too, like version 1.0, does not feature RTPs, but has Keertanams and Varnams. This resulted in a dataset that featured 42 raagas, 667 works and 2324 recordings.

Version 1.0

This snapshot is dated May 2015. This is created with a criteria as follows: each raaga must have a minimum number of 10 performances that span over at least 5 different compositions. Note that this do not include RTPs (Raagam-Taanam-Pallavi) and mainly feature Kritis. Other forms that this dataset consists of are Keertanas and Varnams. With the CompMusic collection at the time, this resulted in a dataset that featured 45 raagas, 545 works and 934 recordings.

Notations

We have looked up the works (in both version 1.0 and version 2.0) for their notation in different sources (published books, online resources such as personal blogs and forums). Those which were available are manually converted to a machine readable format (yaml). Each file is essentially a dictionary with section names of the work/composition as keys. Each section is represented as a list of cycles. Each cycle in turn has a list of divisions.

Possible uses of the dataset

Though the audio files for this dataset come from commercial sources and hence cannot be put in public domain, we have made the melody extracted available. This dataset can be used for melodic analyses: characterizing intonation, motif discovery and tonic identification. The availability of a machine readable notation files allows the dataset to be used for audio-score alignment.

Availability of the dataset

The notation files available to date can be downloaded from here. The version 1.0 of the dataset with all it's metadata can be downloaded from here. A similar file for version 2.0 can be had from here. The melody files corresponding to them can be downloaded from here (v1) and here (v2) respectively. [Links TBD].

Contact

If you have any questions or comments about the dataset, please feel free to write to us. 
 
Gopala Krishna Koduri
Music Technology Group,
Universitat Pompeu Fabra, 
Barcelona, Spain
gopala <dot> koduri <at> upf <dot>edu