Turkish şarkı vocal dataset

Turkish şarkı vocal dataset is a collection of recordings of compositions from the vocal form şarkı. The recordings

are selected from a musicBrainz collection of Turkish music


The collection has annotations of lyrics. Each lyrical phrase is aligned to its corresponding segment in the audio. 


This dataset can be downloaded here.

The dataset

Audio music content

version 1

It features 10 performances of different compositions. Five are sung by male and 5 by female singer. The recordings are selected so that there is only a single main vocalist (or the intensity of the back vocalst is relatively low compared to the main vocalist). Accompanying instruments are mainly string ensembles. No percussive instruments are present. Audio is in .wav format. 

Please cite the following publication for  if you use the dataset in your work: 

Dzhambazov, G.Şentürk S., & Serra X. (2014).  Automatic lyrics-to-audio alignment in classical Turkish music4th International Workshop on Folk Music Analysis


version 2

It is an modification with some added and some omitted recordings of Version 1 and features 12 performances of 11 different compositions. 8 sung by female and 4 by male. 
Please cite the following publication for  if you use the dataset in your work:
Lyrical phrases annotations
The audio is segmented into one-section chunks (a section is nakarat, meyan etc.)
Each audio segment is aligned to the lyrical phrases.  A phrase corresponds roughly to a musical

bar and contains 1 or 2 words. 

An annotation file is in .TextGrid format of Praat.

Availability of the dataset

The audio content and annotations are openly available.


If you have any questions or comments about the dataset, please feel free to write to us. 
Georgi Dzhambazov
Music Technology Group,
Universitat Pompeu Fabra, 
Barcelona, Spain
georgi <dot> dzhambazov <at> upf <dot>edu