Companion webpage for the PhD thesis of Georgi Dzhambazov

This page is the companion web page for the PhD thesis titled

Knowledge-based Probabilistic Modeling for Tracking Lyrics in Music Audio Signals

Georgi Dzhambazov

(Last updated: 7 July 2017)


In this thesis, we devise computational models for tracking sung lyrics in multi-instrumental music recordings. We consider not only the low-level acoustic characteristics, representing the timbre of the sung phonemes, but also higher-level music knowledge, that is complementary to lyrics. We build probabilistic models, based on dynamic Bayesian networks (DBN) that represent the relation of phoneme transitions to two music knowledge facets: the temporal structure of a lyrics line and the structure of the metrical cycle. In one model we exploit the fact the expected syllable durations depend on their position within a lyrics line. Then in another model, we propose how to estimate vocal onsets by tracking simultaneously the position in the metrical cycle, and how these estimated onsets influence the transitions between consecutive phonemes. Using the proposed models sung lyrics are automatically aligned to written lyrics on datasets from Ottoman Turkish makam and Beijing opera, whereby principles, specific for these music traditions are considered. Both models improve a baseline, unaware of music-specific knowledge. This confirms that music-specific knowledge is an important stepping stone for computationally tracking lyrics, especially in the challenging case of singing with instrumental accompaniment.

A longer and detailed abstract is here

Link to the thesis manuscript

Link to the thesis defense presentation slides

Please click on the headings to expand.

All the datasets (introduced in Chapter 3.2) used in our work are made publicly available for research purposes. Most of them are version controlled. The companion web pages corresponding to the publications list these associated datasets. They are listed below:

A list of all publications by the author done as part of the work in MTG can be found here. Here we list the ones relevant for the work presented in this thesis:

To reproduce these papers, the scripts and figures used to generate them are here

The core code corresponding to the experiments performed as a part of the thesis is organized different git repositories. Links to specific selected scripts/code are given below.

Evaluation Scripts

Core algorithms

Other tools

The code for the individual experiments needs refactoring. It will be done soon..untill then if there is any confusion please feel free to contact the author. 

Duration aware lyrics-to-audio alignment

A demo of durations derived from music score for OTMM (Chapter 4.4)

1. Create an account in Dunya-web 2. Select OTMM songs that have a vocal part (filter by form şarkı). 3. Click link "Access lyrics player" on the right hand-side (sometimes not available if no score available, etc.

Or you can have a look at an example recording

Metrical-accent aware onset detection

in progress ...

(This page and thesis document are generated by scripts here )

For access to data, code, requests on work in progress, or if you have any questions/comments, please contact:

Georgi Dzhambazov