This project utilizes an NBA prospect’s college basketball stats to predict what pick the player will be drafted.
-
NCAA season stats for each player from Aditya Kumar's College Basketball 2009-2021 + NBA Advanced Stats
https://www.kaggle.com/datasets/adityak2003/college-basketball-players-20092021?resource=download
-
RPI Ratings for conferences and teams from TeamRankings.com
https://www.teamrankings.com/ncaa-basketball/rpi-ranking/rpi-rating-by-conf/ & https://www.teamrankings.com/ncb/rpi/
- Scraped RPI ratings for teams and conferences.
- Merged ratings data with NCAA data after making join key consistent between datasets (i.e. reformatting conference and team names to match in each dataset).
- Cleaned Data (e.g. Drop Null values, verified data consistency, changed data types, etc.).
- Grouped the records of CollegeBasketballPlayers2009-2021.csv on player by taking a weighted average of each attribute dependent on a players’ minutes played.
- Normalized Data.
- Trained a random forest regressor on a prospect's college basketball stats to predict draft pick.
- Tuned the hyperparameters through Bayesian optimization before visualizing the model and its performance.
- Measured and Plotted model accuracy.
Ideas to implement in the future to improve accuracy:
- Adding additional auxiliary data from the nba draft combine (e.g. draft combine data like vertical leap, wingspan, etc.).
- Weighting the weighted average on multiple variables (e.g. minutes played, conference strength, and team strength).
- Trying a different machine learning model (e.g. neural network regressor).
- Test model on CollegeBasketballPlayers2022.csv and rank regression scores in order. Measure how close model's prediction of the draft is to the true outcome.