This project investigates the relationship between music listening habits and mental health, motivated by a personal passion for music and its positive impact on well-being. As an avid listener of various music genres, I have experienced firsthand how different types of music can influence mood and emotional states. This project aims to explore these effects more systematically, focusing particularly on individuals with mental health conditions. |
By analyzing a comprehensive dataset that includes listener habits, demographics, and self-reported mental health experiences, we seek to uncover the impact of different music genres on mental well-being. The ultimate goal is to develop a predictive model that can assess the potential effects of music on mental health.
Data is organized in the data
folder:
- Raw Data: Original, unprocessed data.
- Interim Data: Transformed data used as an intermediate step.
- Processed Data: Final datasets prepared for analysis and modeling.
For a detailed description of the data, refer to the Data Description
file.
Data preparation is crucial for accurate analysis and modeling. The make_dataset.py
script manages data cleaning and transformation, including handling missing values and outliers. The cleaned data is stored in the data/interim
directory. Details are documented in the Data Preparation Report
.
Effective visualization aids in data interpretation. The visualize.py
script creates exploratory and result-oriented visualizations. Documentation for the script is available in the Data Visualization Script Documentation
. Figures are saved in the reports/figures
directory.
The EDA and Data Analysis Report
provides detailed insights into the dataset, highlighting key trends, correlations, and findings that guide preprocessing decisions.
Feature engineering boosts model performance by transforming raw data into meaningful features. Key steps include encoding, scaling, creating new features, and addressing class imbalance, detailed in the Feature Engineering Documentation
. The preprocessing script, build_feature.py
handles these transformations.
Preprocessed data is stored in the data/processed
directory.
- Model Training: Various machine learning models were evaluated, with the
StackingClassifier
, which combinesRandomForest, SVC, KNeighborsClassifier,
andLogistic Regression
, achieving the best results. Training and evaluation are handled by thetrain_model.py
script, and the trained model is saved in themodels
directory. See theModeling Report
for details.
Find comprehensive project documentation, including methodology, data descriptions, and sources, in the docs
directory.
- Python 🐍: For data preparation, cleaning, visualization, and modeling.
- Libraries:
- pandas 📈: Data manipulation and analysis.
- NumPy 🔢: Numerical operations and data handling.
- seaborn 🌈: Statistical data visualization.
- matplotlib 📊: Plotting and visualization.
- scikit-learn ⚙️: Machine learning and modeling.
- Code Editor: Visual Studio Code (VS Code)
-
Summary: Music listening has a generally positive impact on mental health, with notable benefits for those who listen daily. Rock is the most popular genre among younger individuals.
-
Key Takeaways:
- Demographics and Music Preferences: 🎸 Younger individuals (ages 14-27) favor Rock, Pop, and Metal, with Rock being the most popular genre.
- Listening Habits and Mental Health: 🎧 Daily music listening (1 to 3.5 hours) generally has a positive impact on mental health, reducing anxiety and depression.
- Musical Background and Engagement: 🎤 Many respondents are musicians or actively engage with music, but there’s no significant difference in mental health impact between musicians and non-musicians.
- Correlation Analysis:
- Music Effects: 🎵 Music effects correlate positively with favorite genres and negatively with work activities.
- Mental Health Conditions: 🧠 Strong correlations among mental health conditions suggest a tendency for co-occurrence.
- Music Characteristics: 🎶 Tempo and listening duration affect mental health, though not exclusively.
Overall Impact: 🌟 Music positively influences mental well-being.
- Summary: The
StackingClassifier
achieved a high test accuracy, indicating strong model performance. Precision, recall, and F1-scores were well-balanced. - Key Takeaways:
- StackingClassifier Performance: 🏆 The
StackingClassifier
, combiningRandomForest, SVC, KNeighborsClassifier,
andLogistic Regression
, achieved a 0.9049 Test Accuracy. - Precision and Recall: 🎯 The model demonstrated excellent precision and recall.
- Balanced F1-Scores: 📈 F1-scores were balanced across all classes, indicating robust model performance.
- StackingClassifier Performance: 🏆 The
-
Broadening Data Scope: Expanding dataset to include diverse demographics and additional variables like listening context and physiological responses.
-
Enhancing Models: Exploring advanced algorithms, optimizing hyperparameters, and improving feature selection for better prediction accuracy.
-
Longitudinal Studies: Conducting studies to assess long-term effects of music on mental health and tracking changes over time.