Indian Art Music Tonic Datasets

Introduction

These datasets comprise audio excerpts and manually done annotations of the tonic pitch of the lead artist for each audio excerpt. Each excerpt is accompanied by its associated editorial metadata. These datasets can be used to develop and evaluate computational approaches for automatic tonic identification in Indian art music. These datasets have been used in several articles mentioned below. A majority of these datasets come from the CompMusic corpora of Indian art music, for which each recording is associated with a MBID. With the MBID other information can be obtained using the Dunya API. We here provide an overview of the tonic identification datasets.  

Please cite the following publication if you use the material shared here in your research work.

[1]. Gulati, S., Bellur, A., Salamon, J., Ranjani, H. G., Ishwar, V., Murthy, H. A., & Serra, X. (2014). Automatic Tonic Identification in Indian Art Music: Approaches and Evaluation. Journal of New Music Research43(01), 55–73. 

[Postprint PDF@MTG] [Bibtex] [Resources]

 

Datasets 

The statistics about the datasets for tonic identification is listed in the table below. These six datasets are used in [1] for a comparative evaluation. To the best of our knowledge these are the largest datasets available for tonic identification for Indian art music. These datases vary in terms of the audio quality, recording period (decade), the number of recordings for Carnatic, Hindustani, male and female singers and instrumental and vocal excerpts. For a detailed information about these datasets we refer to Chapter 3 of this thesis.

(click to enlarge)

Table

All the datasets (annotations) shown above are version controlled, and can be accessed from here. In case you are looking for a particular dataset and its features/audio, you can obtain them as shown below. To know how the features are extracted visit the companion page for the publication.

The audio files corresponding to these datsets are made available on request for only research purposes. To obtain the files fill this FORM.

 

CompMusic Tonic Identification Datasets 

Datasets: CM1CM2CM3

Features: pitch + multipitch histogram + pitch histograms

 

IITM Tonic Identification Datasets 

Datasets: IITM1IITM2

Features: pitch + multipitch histogram + pitch histograms

 

IISc Tonic identification Dataset 

Dataset: IISc

Features: pitch + multipitch histogram + pitch histograms

 

Annotation Format 

The tonic annotations are availabe both in tsv and json format. 

TSV: <relative path to audio><tab><tonic(Hz)><tab><Carnatic or Hindustani><tab><artist_name><tab><gender of the singer><vocal or instrumental> 

JSON: {
       'artist': <name of the lead artist if available>, 

       'filepath': <relative path to the audio file>,

        'gender': <gender of the lead singer if available>,

        'mbid': <musicbrainz id when available>,

        'tonic': <tonic in Hz>,

        'tradition': <Hindustani or Carnatic>,

        'type': <vocal or instrumental>
     }


where keys of the main dictionary are the filepaths to the audio files (feature path is exactly the same with a different extension of the file name).

 

References 

  1. Gulati, S., Bellur, A., Salamon, J., Ranjani, H. G., Ishwar, V., Murthy, H. A., and Serra, X. (2014). Automatic Tonic Identification in Indian Art Music: Approaches and Evaluation.Journal of New Music Research43(01), 53-71.

  2. Gulati, S.,  A Tonic Identification Approach for Indian Art Music, Master dissertation, Music Technology Group, University of Pompeu Fabra, Barcelona, Spain, 2012.

  3. Salamon, J., Gulati, S., and Serra, X. "A Multipitch Approach to Tonic Identification in Indian Classical Music." In proc. of ISMIR, pp. 499-504, Portugal, Porto. 2012.

 

Contact 

If you have any questions or comments about the dataset, please feel free to email: [sankalp (dot) gulati (at) gmail (dot) com], or [sankalp (dot) gulati (at) upf (dot) edu]