This study explores the Music Features to identify likely hit songs which will hit the top charts, and takes the analysis further to predict songs more probable to be performed at Music Festivals. We use the Spotify data as well as Billboard data and predict hits using a unique approach of genre similarity using euclidean distance. The problem of predicting the popularity values is addressed as a classification problem and various models are tested to obtain the best results. The training has been carried out on a corpus of 27K+ songs. To enhance these results, the Music Festival data has been collected and filtered to include data from just a single music festival named ’Glastenbury’. Nearly 18 genres were listed each year for this festival and Time Series analysis was performed. The RMSE value obtained for the ARIMA model built was close to 0.13 after normalisation of the data which was followed by a genre prediction to obtain an F1 score of 0.9627 with a Decision Tree Classifier. The files used in this study have been collected and organized in this repository.
The data files experimented with and the links to all the data dets used in this study is present under this directory.
Exploratory Data Anlaysis performed on the data as well as the pre-processing techniques carried out have been presented here.
Further analysis of the data along with the performance of all the models experimented with are present in this directory.
The cells of the Jupyter Notebook file needs to be run in order to produce results as shown in the output of each file. The path to to the data folder also needs to be specified/changed depending on where the data sets are placed. There are certain dataset dependencies due to generation of new data sets from the existing ones in some of the notebook files. However, the generated datasets have also been placed with their respective notebook files.