Project-on-ML-dataset-and-models-for-stock-performance-predictions-based-on-financial-ratios

A Python module using Jupyter Notebooks to take an existing dataset available at Kaggle and undertake some data cleansing, data hard coding and data science management so it can be more useful for Machine Learning models.

Source of original dataset: https://www.kaggle.com/datasets/ifuurh/nasdaq100-fundamental-data

Introduction

The problem we are trying to solve is that there are very limited datasets on Kaggle if you wish to apply ML models to the problem of individual stock Share Price prediction using financial statement ratios as your input data.

This is a problem that needs addressing as there is a multi-billion global fundamental financial ratio investment analysis industry that is ripe for performance enhancement by Machine Learning.

We believe that the best dataset for such a purpose on Kaggle was the above dataset that we found above.

The problem with this dataset for ML model use was as follows:

· There was a number of data attributes that were not shown across each annual period. We removed data attributes that were not populated across all the annual periods.

· We filled in data that was missing and we replaced NANs and INFs with logical and reasonable fill values.

· We attached label data being 12 month ahead Share Price returns for each stock and each annual period providing this data both as discrete percentage returns and binary outperform or underperform the Nasdaq 100 index labels.

Resulting Datasets

The resulting datasets cover 102 stocks using 39 financial ratios across both 4 and 5 year periods using two different types of labels.

In summary, this repository provides a Jupyter Notebook that shows the steps undertaken to generate:

Two datasets for 2017 to 2021 with the Y labels attached at the end column.

· labels 1 or 0: for binary outperformance against index.

· perfs labels: for actual performance for the stock for that calendar year.

And Two mote datasets for 2017 to 2020 with the same Y label data as above:

· labels 1 or 0: for binary outperformance against index.

· perfs labels: for actual performance for the stock for that calendar year.

Usage & Contributing

At the moment the project is in development.

You can use the repository and play with the Jupyter Notebook to generate your own datasets with differing assumptions to ours.

We will then load up some ML models that we think can be the most effective at predicting 12 month forward Share Price outcomes based on the 39 financial ratios provided.

We would welcome your thoughts on our models. Even better we would welcome YOUR ideas on the best models to use to solve such a prediction problem using these datasets?

You can always help to get this problem solved. It's an open-source project after all!

Resources · Kaggle: https://www.kaggle.com/datasets/ifuurh/nasdaq100-fundamental-data · Jupyter Notebooks: https://jupyter.org/ · Yfinance: https://pypi.org/project/yfinance/

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.ipynb_checkpoints		.ipynb_checkpoints
LICENSE		LICENSE
ML_dataset_for Nasdaq100stocks_financial ratios_with labels.ipynb		ML_dataset_for Nasdaq100stocks_financial ratios_with labels.ipynb
README.md		README.md
X_Ylabels_17_20.csv		X_Ylabels_17_20.csv
X_Ylabels_17_21.csv		X_Ylabels_17_21.csv
X_Ylabels_17_21_pca.csv		X_Ylabels_17_21_pca.csv
X_Yperf_17_20.csv		X_Yperf_17_20.csv
X_Yperf_17_21.csv		X_Yperf_17_21.csv
data17_20.csv		data17_20.csv
data17_21.csv		data17_21.csv
git Model for CNN Classifier.ipynb		git Model for CNN Classifier.ipynb
git Model for Gradient Boosting.ipynb		git Model for Gradient Boosting.ipynb
git Model for RandomForestClassifier.ipynb		git Model for RandomForestClassifier.ipynb
git Model for Ridge Regression.ipynb		git Model for Ridge Regression.ipynb
git Model for SupportVectorMachine.ipynb		git Model for SupportVectorMachine.ipynb
labels17_20.csv		labels17_20.csv
labels17_21.csv		labels17_21.csv
nasdaq100_metrics_ratios.csv		nasdaq100_metrics_ratios.csv
perfs17_20.csv		perfs17_20.csv
perfs17_21.csv		perfs17_21.csv
~$AD me for GITHUB.docx		~$AD me for GITHUB.docx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project-on-ML-dataset-and-models-for-stock-performance-predictions-based-on-financial-ratios

About

Releases

Packages

Languages

License

SheepBoss/Project-on-ML-dataset-and-models-for-stock-performance-predictions-based-on-financial-ratios

Folders and files

Latest commit

History

Repository files navigation

Project-on-ML-dataset-and-models-for-stock-performance-predictions-based-on-financial-ratios

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages