Annotated jingju arias dataset

The Annotated Jingju Arias Dataset is a collection of 34 jingju arias manually segmented in various levels using the software Praat v5.3.53. The selected arias contain samples of the two main shengqiang in jingju, name xipi and erhuang, and the five main role types in terms of singing, namely, dan, jing, laodan, laosheng and xiaosheng.


This dataset can be donwloaded here.


The dataset includes a Praat TextGrid file for each aria with the following tiers (all the annotations are in Chinese):

  1. aria: name of the work (one segment for the whole aria)
  2. MBID: MusicBrainz ID of the audioi recording (one segment for the whole aria)
  3. artist: name of the singing performer (one segment for the whole aria)
  4. school: related performing school (one segment for the whole aria)
  5. role-type: role type of the singing character (one segment for the whole aria)
  6. shengqiang: boundaries and label of the shengqiang performed in the aria (including accompaniment)
  7. banshi: boundaries and label of the banshi performed in the aria (including accompaniment)
  8. lyrics-lines: boundaries and annotation of each line of lyrics
  9. lyrics-syllables: boundaries and annotation of each syllable
  10. luogu: boundaries and label of each of the performed percussion patterns in the aria

The ariasInfo.txt file contains a summary of the contents per aira of the whole dataset.

A subset of this dataset comprising 20 arias has been used for the study of the relationship between linguistic tones and melody in the following papers:

Here is the list of the arias from the dataset used in these papers.

The whole dataset has been used for the automatic analysis of the structure of jingju arias and their automatic segmentation in the following master's thesis:

The audio recordings used for these annotations are available for research purposes. Please contact Rafael Caro Repetto (