An XML Dataset for Hindustani Classical Music
The dataset is a collection of XML files. Each XML file corresponds to a composition. Each XML file is named as [raag-name]-[composition index belonging to the raag]-[page number of the book Kramik Pustak Malika].xml
.
Currently the dataset consits of 116 XML files belonging to raag Bhairav, Todi, and Poorvi. The frequencies of compositions for each of these three rags are 42, 39, and 35 respectively.
The music-sheets of the compositions is rendered using Ome Swarlipi fonts and style engine. A sample HTML music-sheet is given in Visualization
directory. It also contains a converter to transform XML file to music-sheet HTML file. You will notation.css
file in the directory as the HTML music-sheet file in order to visualize it in Devanagari script. This file is also included in the directory.
There are four queries written in XQuery inside XQuery Files
. In order to run the queries an XML database needs to be created from the XML files. We have used BaseX for that. The queries can be written and run the BaseX editor itself.
To use the dataset to build ML classifiers for raag classification problem, we have transformed the dataset into a csv file which contains the frequency distribution of the notes for each composition and the raag of the same. The csv file can be found inside Machine Learning
directory. The name of the csv file is Bhatkhande-Dataset.csv
. The csv file can also be generated using the XQuery file freq-dist-notes-to-csv.xq
present inside XQuery Files
directory. We have also included the Python code to upload the dataset and run ML algoritms on it as .ipynb
file.