Bitcoin Price Prediction

About the Project

This project was developed as part of the TÜBİTAK 2209 - A Research Project Support Program for Undergraduate Students. It is important to note that this research was conducted for academic purposes only and not for trading or investment advice.

This project utilizes various machine learning and deep learning models to predict Bitcoin prices. The project combines Bitcoin price data with other financial data such as USD and gold prices to forecast future Bitcoin prices. The main goal is to explore and evaluate different prediction methodologies in the context of cryptocurrency price forecasting.

Objectives

Predict Bitcoin price movements
Compare the performance of different machine learning and deep learning models
Identify the best approaches for improving prediction accuracy in financial time series data

Dataset

Datasets used in the project:

Bitcoin Historical Data: Historical price data of Bitcoin
- Source: Investing.com Bitcoin Historical Data
- Date Range: January 29, 2017 to December 29, 2024
USD/TRY: US Dollar/Turkish Lira exchange rate data
- Source: Central Bank of the Republic of Turkey (CBRT) - EVDS
- Date Range: January 01, 2017 to December 30, 2024
XAU/USD: Gold price data
- Source: Investing.com Gold Historical Data
- Date Range: January 02, 2017 to December 27, 2024

These datasets are merged using proper data alignment and imputation techniques to prevent data leakage while capturing correlations in financial markets. Missing values are handled with appropriate methods that maintain temporal integrity, ensuring that future information is not inadvertently used in training models for past predictions.

Prediction Period and Feature Selection

For the prediction models, we focused on the following time period:

Training Period: September 01, 2023 to September 14, 2024
Testing Period: September 15, 2024 to September 30, 2024
Lookback Period: 30 days (for time series features)

While numerous features were initially generated from the raw data, the final set of features was selected after rigorous statistical testing to address multicollinearity issues and optimize model performance. The selected features include:

Date
Price (target variable)
High (daily high price)
usd_buy (USD/TRY exchange rate)
gold_Price (XAU/USD price)
RSI (Relative Strength Index)
MA_7 (7-day Moving Average)
BTC_Gold_Ratio (Bitcoin to Gold price ratio)
BTC_USD_Ratio (Bitcoin to USD price ratio)

Modelling

The following models were used in the project:

Classical Machine Learning Models

Linear Models: Linear Regression
Tree-based Models: Decision Tree, Random Forest
Boosting Methods: AdaBoost, XGBoost, LightGBM, CatBoost
Kernel Methods: Support Vector Regression (SVR)

Deep Learning Models

Bidirectional LSTM (Long Short-Term Memory) networks

AutoML Solutions

AutoGluon library for automatic model selection and hyperparameter optimization

Results

Model performance comparison was conducted using the following metrics:

RMSE (Root Mean Squared Error)
MAE (Mean Absolute Error)
R² (Determination Coefficient)
MAPE (Mean Absolute Percentage Error)

Note on R² in Time Series: While R² is included as a metric, it should be interpreted with caution in time series forecasting. R² measures the proportion of variance explained by the model compared to a simple mean model, which may not be appropriate for non-stationary time series data like cryptocurrency prices. High R² values in financial time series can sometimes be misleading due to trends and should be considered alongside other metrics like RMSE and MAE.

Model Performance Visualizations

Model	Scaled RMSE	RMSE	MAE	MAPE	R²	CV RMSE	CV MAE	CV MAPE
Linear_Regression	0.039803	583	485	0.76%	0.9287	1,006	698	1.23%
Decision_Tree	0.110921	1,625	1,329	2.11%	0.4466	5,136	3,827	7.05%
SVR	0.219678	3,218	2,844	4.43%	-1.1706	7,422	6,131	11.08%
Random_Forest	0.057608	844	699	1.10%	0.8507	4,337	3,280	6.30%
XGBoost	0.130051	1,905	1,638	2.58%	0.2393	4,127	3,091	5.92%
LightGBM	0.062776	920	688	1.09%	0.8227	4,957	3,840	7.34%
AdaBoost	0.090499	1,326	1,157	1.82%	0.6316	4,450	3,397	6.50%
CatBoost	0.089907	1,317	1,116	1.75%	0.6364	5,308	4,185	7.92%

It's important to note that all classical machine learning models were evaluated using their default parameters without any hyperparameter optimization. This approach provides a baseline comparison of the models' inherent capabilities for this specific prediction task. Performance could potentially be improved through proper hyperparameter tuning.

Based on the performance metrics in the table above, the top three performing classical machine learning models are:

Linear Regression: Shows the best performance with the lowest RMSE (583), MAE (485), and MAPE (0.76%), as well as the highest R² value (0.9287). However, it's important to note that despite these impressive metrics, Linear Regression models can be less reliable in financial time series prediction due to multicollinearity issues among features. The high performance might be misleading as the model could be overfitting to the training data, which is also suggested by the higher CV RMSE compared to the test RMSE.
Random Forest: Demonstrates good performance with an RMSE of 844, MAE of 699, and a strong R² value of 0.8507. This ensemble method is more robust against multicollinearity and can capture non-linear relationships in the data.
LightGBM: Performs well with an RMSE of 920, MAE of 688, and an R² value of 0.8227. This gradient boosting framework is efficient and can handle complex relationships in financial data.

LSTM Model Performance

The LSTM model was trained with a bidirectional architecture to capture temporal patterns in both directions. The model achieved the following performance metrics:

Training Metrics:

Training Loss	Training MAE	Validation Loss	Validation MAE
0.2423	0.1822	0.2222	0.1649

Test Metrics:

Scaled RMSE	RMSE	MAE	R² Score	MAPE
0.0712	1,204.95	1,049.35	0.6956	1.67%

While the LSTM model shows higher RMSE and MAE values compared to some classical models, it demonstrates superior ability to capture the temporal dynamics of Bitcoin prices, especially for longer prediction horizons. The model's strength lies in its capacity to learn complex patterns over time sequences, making it particularly valuable for financial time series forecasting.

AutoGluon Results

The AutoGluon framework was used to automatically select and optimize models for the prediction task. For detailed results and visualizations of the AutoGluon performance, please refer to the visualization files in the src/visualization directory. The AutoML approach provides an interesting comparison to both the classical models and the deep learning approach, as it automatically handles feature engineering, model selection, and hyperparameter tuning.

Best Performing Models

For the specified prediction period (September 2023 - September 2024), the following models showed the best performance:

LSTM: Best for capturing temporal dynamics and time series patterns
Random Forest: Strong overall performance with good robustness
Linear Regression: Highest numerical accuracy but caution needed due to multicollinearity

It's important to note that model performance is highly dependent on the specific time period used for training and testing. Different results may be obtained with different date ranges, market conditions, or feature selections.

Future Work

Incorporation of sentiment analysis and social media data into the model
Experimentation with Transformer-based models
Development of models for longer-term predictions
Creation of a real-time prediction system with live data streaming

Contact

Please get in touch if you have any questions about the project.

Project Structure

bitcoin-price-prediction/
├── data/                          
│   ├── Bitcoin Historical Data.csv # Bitcoin historical data
│   ├── dolar.csv                   # USD/TRY data
│   ├── XAU_USD Geçmiş Verileri.csv # Gold price data
│   └── merged_data.csv             # Merged dataset
├── src/                            
│   ├── pipeline.ipynb              # Main pipeline notebook
│   ├── preprocessing/              
│   │   └── data_preprocessor.py    # Data preprocessing classes
│   ├── model/                      
│   │   ├── models.py               # Classical ML models
│   │   ├── lstm_model.py           # LSTM deep learning model
│   │   └── automl_autogluon.py     # AutoML implementation
│   └── visualization/              
│       ├── data_eda.py             # Exploratory data analysis
│       └── model_visualizations.py # Model results visualization
└── README.md

Name	Name	Last commit message	Last commit date
Latest commit enesmanan fix data paths Mar 10, 2025 29553b4 · Mar 10, 2025 History 29 Commits
data	data	update datasets date	Dec 29, 2024
images	images	add result images	Mar 9, 2025
src	src	fix data paths	Mar 10, 2025
.gitignore	.gitignore	update	Mar 6, 2025
README.md	README.md	update readme.md	Mar 9, 2025
requirements.txt	requirements.txt	add reqs.txt file	Mar 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bitcoin Price Prediction

About the Project

Objectives

Dataset

Prediction Period and Feature Selection

Modelling

Classical Machine Learning Models

Deep Learning Models

AutoML Solutions

Results

Model Performance Visualizations

LSTM Model Performance

AutoGluon Results

Best Performing Models

Future Work

Contact

Project Structure

About

Releases

Packages

Languages

enesmanan/bitcoin-price-prediction-2209A

Folders and files

Latest commit

History

Repository files navigation

Bitcoin Price Prediction

About the Project

Objectives

Dataset

Prediction Period and Feature Selection

Modelling

Classical Machine Learning Models

Deep Learning Models

AutoML Solutions

Results

Model Performance Visualizations

LSTM Model Performance

AutoGluon Results

Best Performing Models

Future Work

Contact

Project Structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages