This project aims to predict bike-sharing demand using AutoGluon, a powerful AutoML library. The process involves setting up Kaggle API access, downloading the dataset, preprocessing the data, and building predictive models. The project includes steps for data exploration, feature engineering, model training, hyperparameter optimization, and submission to the Kaggle competition.
The project directory is organized as follows:
Predict_Bike_Sharing_Demand_Udacity_project1/
│
├── Data/
│ ├── train.csv
│ ├── test.csv
│ └── sampleSubmission.csv
│
├── Bike_Sharing_Demand.ipynb
├── Bike_Sharing_Demand.py
├── README.md
├── report-template.md
├── requirements.txt
├── kaggle_scores.png
├── top_model_performance.png
└── Bike_Sharing_Demand.html
- 📁 Data/: Contains the datasets for training, testing, and submission.
- 📓 Bike_Sharing_Demand.ipynb: Jupyter notebook with the main code for data processing, model training, and evaluation.
- 📜 Bike_Sharing_Demand.py: Python script for executing the bike demand prediction.
- 📝 README.md: This file, detailing the project overview, setup instructions, and usage.
- 🗂️ report-template.md: Markdown template for the project report.
- 📊 kaggle_scores.png: Plot showing Kaggle competition scores.
- 📈 top_model_performance.png: Plot showing the performance of the top model.
- 🌐 Bike_Sharing_Demand.html: Exported HTML version of the Jupyter notebook.
Ensure Python 3.x is installed and the necessary libraries are listed in requirements.txt
.
-
Clone the repository:
git clone https://github.com/Ganesh2409/Predict_Bike_Sharing_Demand.git cd Predict_Bike_Sharing_Demand
-
Install the required packages:
pip install -r requirements.txt
- 👤 Create a Kaggle Account: Register on Kaggle and obtain an API key.
- 🔑 Setup Kaggle API Key: Save the API key in the
.kaggle
directory.
- 📥 Download: Use the Kaggle API to download and unzip the dataset.
- 📊 Load Data: Import datasets (
train.csv
,test.csv
,sampleSubmission.csv
) and parse datetime columns. - 🔍 Explore Data: Examine data statistics and initial summaries.
- 🔧 Preprocessing: Drop irrelevant columns (
casual
,registered
) and set thecount
column as the target. - 🤖 Model Training: Use AutoGluon’s
TabularPredictor
with a 10-minute time limit andbest_quality
preset. - 📈 Evaluation: Review model performance and generate predictions for the test set.
- 🛠️ Add Features: Extract and add new features such as year, month, day, and hour from the
datetime
column. - 🔤 Categorical Encoding: Convert categorical features (
season
,weather
) to category type.
- 🔄 Update Features: Re-train the model with the new features and evaluate performance.
- 📤 Submission: Prepare and submit the updated predictions to Kaggle.
- 🔬 Tune Hyperparameters: Optimize model hyperparameters for better performance.
- 📊 Final Evaluation: Generate predictions and submit to Kaggle, and review the results.
- 🏁 Initial Score: 1.80367
- 🚀 Score with Additional Features: 0.51204
- 🎯 Score with Hyperparameter Optimization: 0.54099
- 📈 Visualization: Includes plots of model scores and hyperparameter impacts.
- 🗒️ Hyperparameter Table: Summary of hyperparameters and corresponding scores.
To execute the bike demand prediction script:
python Bike_Sharing_Demand.py
To open and run the Jupyter notebook:
jupyter notebook Bike_Sharing_Demand.ipynb
This project utilizes AutoGluon for predicting bike-sharing demand and showcases the end-to-end process from data acquisition to model submission. By leveraging AutoML, the project demonstrates a robust approach to predictive modelling in a Kaggle competition.
Made with ❤️ ( ͡• ͜ʖ ͡• ) Follow for more ... :)