A machine learning model using Random Trees Embedding with Extra Trees. See ./docs, Project presentation, Project report
A look into the possibility and viability of predicting the winner of a Soccer match involving two international teams by leveraging machine learning to make accurate predictions based on features involving game performance of the two teams. Logistic Regression (LR), Support Vector Machine (SVM), and Gradient Boosting (GB) were all used as a baseline to make predictions to then be compared to the main chosen model, Random Trees Embedding with Extra Trees Classifier (RTE). The RTE proved better than the baseline models, achieving an accuracy of 0.6453 and an average F1-Score of 0.65. Followed by GB which achieved an accuracy of 0.5445 and an average F1-Score of 0.54. Followed by LR and SVM, which achieved similar results for accuracy 0.4925±0.0005 and an F1-Score average of 0.483±0.003.
Keywords: Soccer predictions, Multi-class Classification, Random Trees Embedding, Random Forest, Feature Extraction, Data Resampling.
./docs/Ali-Aljaffer-final.ipynb
: The jupyter notebook containing the code for the projectdatasets
folder: Contains the datasets used by theFinProject.ipynb
results.csv
: International Match Results datasetfifa_ranking-2023-07-20.csv
: FIFA rankings datasetrank_per_yr_T_sorted.csv
: A generated CSV fromfifa_ranking-2023-07-20.csv
that has a row for each country and its columns are the year and the points of the year.- Can be generated by running
create_rank_at_year.py
- Can be generated by running
create_rank_at_year.py
: A script I used to extract information from the rankings dataset to make it more useful../docs/Ali-Aljaffer-final.pptx
: The presentation file./docs/Ali-Aljaffer-FinalProjectReport.pdf
: The full report file
-
fifa_ranking-2023-07-20.csv
: Dataset bycashncarry
, URL: (https://www.kaggle.com/datasets/cashncarry/fifaworldranking) -
results.csv
: Dataset bymartj42
, URL: (https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017?select=results.csv) -
rank_per_yr_T_sorted.csv
: Generated, derived dataset fromfifa_ranking-2023-07-20.csv
. Transposed for easier understanding.