This repository details the implementation of a music streaming and recommendation service similar to Spotify, utilizing a variety of technologies and datasets for a complete and dynamic user experience.
- Free Music Archive (FMA) for a diverse music dataset.
- MongoDB for scalable data storage.
- Apache Spark for efficient large-scale data processing.
- Apache Kafka for real-time music recommendation.
└── analysis_for_PCA.py # Script for finding the optimal number of PCA components for normalization.
└── feature_extraction.py # Script for extracting audio features like MFCCs, etc and loading extracted features into MongoDB.
├── preprocessing.py # Script for cleaning up tracks metadata for the website.
├── model.py # Script for training music recommendation model with Spark using MinhashLSH and Approximate Nearest Neighbours.
├── app.py # Flask/Django app for the actual music streaming service/
└── producer.py # Script for streaming the dataset using Kafka.
Process and store music features by running feature_extraction.py
to extract necessary audio features.
Develop and train the recommendation model:
- Use model.py
to apply the machine learning algorithms via Apache Spark.
- Adjust parameters and algorithms as needed for optimal recommendations.
Deploy the web application and set up real-time recommendation:
- Utilize app.py
to launch a user-friendly music streaming interface.
- Run producer.py
to handle live music streaming based on user activity.
- MongoDB: For efficient management of large datasets.
- Apache Spark: Utilized for scalable data processing and machine learning.
- Apache Kafka: Employs real-time data streaming for dynamic music recommendations.
- Python: Primary language for backend and data processing scripts.
- Flask/Django: Frameworks for web application development.
- Data Handling and Processing: Managing large volumes of audio data efficiently.
- Real-Time Data Streaming: Implementing a robust system with Apache Kafka for live recommendations.
- User Interface Development: Creating an engaging and responsive web interface.