Multi-instrumental vocal onsets Ottoman Turkish makam music dataset

Multi-instrumental vocal onsets OTMM dataset is a subset of the dataset, presented in (Holzapfel et. al, 2014) including only the recordings with singing voice present. The dataset comprises 12 (usually 1-minute) excerpts from recordings with solo singing voice for each of two meter classes, referred to as usuls in Turkish makam: the 9/8-usul aksak and the 8/8-usul düyek, as well as 5 excerpts from the 10-8 usul curcuna.Interestingly, in makam each usul has a characteristic pattern of beat positions, on which percussive strokes are hit. For example, in aksak the beats 1,3,4,5,7 and 9 have strokes. Percussionists of Turkish makam tend to observe these patterns rather conservatively.


A brief description of the dataset is provided below. 


Please cite the following publication if you use the dataset in your work:
Georgi Dzhambazov, Andre Holzapfel, Ajay Srinivasamurthy, Xavier Serra. Metrical-accent aware vocal onset detection in polyphonic audioIn Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017)
(Holzapfel et. al, 2014): Andre Holzapfel, Florian Krebs, and Ajay Srinivasamurthy. Tracking the “odd”: Meter inference in a culturally diverse music corpus. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR 2014), pages 425–430 

The Dataset

Audio music content 

Each piece is uniquely identified using the MBID of the recording. The pieces are stereo, 160 kbps, mp3 files sampled at 44.1 kHz. The audio is also available as wav files for experiments. 


There are several annotations that accompany each excerpt in the dataset.
  • Beats - timestamps of beats (usually annotated only first 60 seconds)
 Additionally, for a subset (5 aksak excerpts, 5 düyek and 3 curcuna excerpts)   
  • Vocal segments - audio regions that correspond to sections from score with singing voice present
  • Vocal onsets - locations (timestamps) of onsets of the singing voice. Annotation strategy: if a syllable starts with unvoied sound, onsets is annotated at the beginning of the voiced part (e.g. 'Shi' will have the onset beginning at i). However, if a background instrument plays same pitch simultaneously to voice, the vocal onset is marked at the instrument onset, as if it were the vocal onset (because predominant melody will include the instrumental pitch). We used also as guidance the annotated beats - being aware of the location of a beat helped to put more precisely the location of an onset. 
  • f0 (in Hz) the note pitch is taken from music score (not validated for all notes)  
Further descriptions and comments can be found at sheet1  
Results of the paper are on sheet 2 of the same spreadsheet.

Possible uses of the dataset

Possible tasks where the dataset can be used include beat and downbeat tracking, vocal onset detection, note tracking/transcription, singing voice detection, audio to score/lyrics alignment. 

Availability and related datasets

The annotations and audio are publicly shared and available at  
A dataset for the same study on material from western popular music, following the same annotation strategy has been compiled and is available at  


If you have any questions or comments about the dataset, please feel free to write to us. 
Georgi Dzhambazov
Music Technology Group
Universitat Pompeu Fabra, 
Barcelona, Spain
georgi [dot] dzhambazov [at]