Skip to content

omotuno/nba_positions_clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NBA Player Clustering

Overview

This project performs cluster analysis on NBA player statistics to group players into functional categories based on their on-court statistics and style of play. The goal is to move beyond traditional position labels like "point guard" or "center" and instead look at how players actually perform using advanced metrics.

The analysis uses data from the 2020-2021 NBA season and considers statistics like points, rebounds, assists, blocks, field goal percentage, etc. Three clustering algorithms are tested - K-means, hierarchical, and model-based clustering. The optimal number of clusters is determined to be 3 based on multiple criteria.

The resulting 3 player clusters are characterized as:

In-the-Paints: High rebounds and blocks. Score efficiently inside but have poor outside shooting.

Generals: High points, assists, steals. Playmakers and scorers. Control tempo on offense.

Versatiles: Jack-of-all-trades. Don't stand out in any one stat but contribute across multiple categories. Visualizations and statistical comparisons are used to profile each cluster. The analysis shows traditional position labels like "point guard" no longer adequately describe a player's role and style. The clusters provide an improved categorization of playing style in the modern NBA.

Code

The R code to perform the clustering analysis and generate visualizations is here. It does the following:

-- Loads and prepares the dataset

-- Removes unnecessary columns

-- Imputes missing 3-point percentage values

Screenshot 2023-12-10 at 4 13 46 PM Screenshot 2023-12-10 at 4 14 19 PM

-- Resolves dual position labels using KNN

-- Determines optimal number of clusters

-- Compares cluster distributions across methods

Screenshot 2023-12-10 at 4 15 16 PM Screenshot 2023-12-10 at 4 16 11 PM

-- Clusters two datasets using K-means, hierarchical, and model-based algorithms Screenshot 2023-12-10 at 4 16 42 PM

Screenshot 2023-12-10 at 4 16 58 PM Screenshot 2023-12-10 at 4 17 16 PM Screenshot 2023-12-10 at 4 17 34 PM Screenshot 2023-12-10 at 4 18 04 PM

-- Visualizes clusters through pairs plots, bar charts, etc.

Screenshot 2023-12-10 at 4 18 18 PM Screenshot 2023-12-10 at 4 18 46 PM Screenshot 2023-12-10 at 4 19 05 PM Screenshot 2023-12-10 at 4 19 27 PM

-- Profiles and names the clusters based on their statistics Screenshot 2023-12-10 at 4 19 48 PM

Screenshot 2023-12-10 at 4 20 06 PM Screenshot 2023-12-10 at 4 21 56 PM Screenshot 2023-12-10 at 4 22 13 PM Screenshot 2023-12-10 at 4 24 14 PM

-- Analyzes Team USA and NBA championship teams

Screenshot 2023-12-10 at 4 22 57 PM

-- Results

Key results and findings:

Traditional position labels like PG or C don't predict playing style/stats 3 clusters found to best categorize players: In-the-Paints, Generals, Versatiles Each cluster has distinct statistical signatures related to scoring, playmaking, etc. Playing style in NBA has evolved from the inside-focused 80s as the 3-point shot has become more emphasized Team USA and NBA championship teams tend to be dominated by Versatile players The analysis provides an improved, data-driven way to group NBA players based on their contributions rather than traditional notions of position.

Next Steps

Potential extensions or future work:

-- Add more seasons of data for a timeseries analysis

-- Incorporate additional advanced metrics like PER, Win Shares, etc.

-- Look at team-level composition and performance for different mixes of clusters

-- Build interactive visualizations and a player recommendation tool

Releases

No releases published

Packages

No packages published