Carnatic Varnam Dataset

Carnatic varnam dataset is a collection of 28 solo vocal recordings, recorded for our research on intonation analysis of Carnatic raagas. The collection has the audio recordings, taala cycle annotations and notations in a machine readable format.
Please cite the following publication if you use the dataset in your work: 
Koduri, G. K., Ishwar, V., Serrà, J., & Serra, X. (2014). Intonation analysis of rāgas in Carnatic music. Journal of New Music Research, 43(01), 73–94.


This dataset can be downloaded here.

The dataset

Audio music content

They feature 7 varnams in 7 rāgas sung by 5 young professional singers who received training for more than 15 years. They are all set to Adi taala. Measuring the intonation variations require absolutely clean pitch contours. For this, all the varṇaṁs are recorded without accompanying instruments, except the drone.
Raaga Recordings Duration (minutes)
Ābhōgi 5 29
Bēgaḍa 3 27
Kalyāṇi 4 27
Mōhanaṁ 4 24
Sahāna 4 28
Sāvēri 5 36
Śrī 3 26
Total 28 197

Taala annotations

The recordings are annotated with taala cycles, each annotation marking the starting of a cycle. We have later automatically divided each cycle into 8 equal parts. The annotations are made available as sonic visualizer annotation layers. Each annotation is of the format m.n where m is the cycle number and n is the division within the cycle. All m.1 annotations are manually done, whereas m.[2-8] are automatically labelled.


The notations for 7 varnams are procured from an archive curated by Shivkumar, in word document format. They are manually converted to a machine readable format (yaml). Each file is essentially a dictionary with section names of the composition as keys. Each section is represented as a list of cycles. Each cycle in turn has a list of divisions.

Possible uses of the dataset

The distinct advantage of this dataset is the free availability of the audio content. Along with the annotations, it can be used for melodic analyses: characterizing intonation, motif discovery and tonic identification. The availability of a machine readable notation files allows the dataset to be used for audio-score alignment.


If you have any questions or comments about the dataset, please feel free to write to us. 
Gopala Krishna Koduri
Music Technology Group,
Universitat Pompeu Fabra, 
Barcelona, Spain
gopala <dot> koduri <at> upf <dot>edu