Multi-instrumental vocal onsets Ottoman Turkish makam music dataset

Multi-instrumental vocal onsets OTMM dataset is a subset of the dataset, presented in (Holzapfel et. al, 2014) including only the recordings with singing voice present. The dataset comprises 12 (usually 1-minute) excerpts from recordings with solo singing voice for each of two meter classes, referred to as usuls in Turkish makam: the 9/8-usul aksak and the 8/8-usul düyek, as well as 5 excerpts from the 10-8 usul curcuna.Interestingly, in makam each usul has a characteristic pattern of beat positions, on which percussive strokes are hit. For example, in aksak the beats 1,3,4,5,7 and 9 have strokes. Percussionists of Turkish makam tend to observe these patterns rather conservatively.

A brief description of the dataset is provided below.

Reference

Please cite the following publication if you use the dataset in your work:

Georgi Dzhambazov, Andre Holzapfel, Ajay Srinivasamurthy, Xavier Serra. Metrical-accent aware vocal onset detection in polyphonic audio, In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017)

-----

(Holzapfel et. al, 2014): Andre Holzapfel, Florian Krebs, and Ajay Srinivasamurthy. Tracking the “odd”: Meter inference in a culturally diverse music corpus. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR 2014), pages 425–430

The Dataset

Audio music content

Each piece is uniquely identified using the MBID of the recording. The pieces are stereo, 160 kbps, mp3 files sampled at 44.1 kHz. The audio is also available as wav files for experiments.

Annotations

There are several annotations that accompany each excerpt in the dataset.

Beats - timestamps of beats (usually annotated only first 60 seconds)

Additionally, for a subset (5 aksak excerpts, 5 düyek and 3 curcuna excerpts)

Vocal segments - audio regions that correspond to sections from score with singing voice present
Vocal onsets - locations (timestamps) of onsets of the singing voice. Annotation strategy: if a syllable starts with unvoied sound, onsets is annotated at the beginning of the voiced part (e.g. 'Shi' will have the onset beginning at i). However, if a background instrument plays same pitch simultaneously to voice, the vocal onset is marked at the instrument onset, as if it were the vocal onset (because predominant melody will include the instrumental pitch). We used also as guidance the annotated beats - being aware of the location of a beat helped to put more precisely the location of an onset.
f0 (in Hz) the note pitch is taken from music score (not validated for all notes)

Further descriptions and comments can be found at sheet1

Results of the paper are on sheet 2 of the same spreadsheet.

Possible uses of the dataset

Possible tasks where the dataset can be used include beat and downbeat tracking, vocal onset detection, note tracking/transcription, singing voice detection, audio to score/lyrics alignment.

Availability and related datasets

The annotations and audio are publicly shared and available at https://github.com/georgid/otmm_vocal_segments_dataset.

A dataset for the same study on material from western popular music, following the same annotation strategy has been compiled and is available at https://github.com/georgid/lakh_vocal_segments_dataset