Unsupervised ML - Myopia Clusters

Machine Learning Bootcamp Assignment

This assignment applies what I learned about unsupervised learning by fitting data to a model and using clustering algorithms to place data into groups. Finally, visualization were created that shares my findings and analysis.

Background

You are on the data science team of a medical research company that’s interested in finding better ways to predict myopia, or nearsightedness. Your team has tried—and failed—to improve their classification model when training on the whole dataset. However, they believe that there might be distinct groups of patients that would be better to analyze separately. So, your supervisor has asked you to explore this possibility by using unsupervised learning.

You have been provided with raw data, so you’ll first need to process it to fit the machine learning models. You will use several clustering algorithms to explore whether the patients can be placed into distinct groups. Then, you’ll create a visualization to share your findings with your team and other key stakeholders.

Instructions

This activity is broken down into four parts:

Part 1: Prepare the Data
Part 2: Apply Dimensionality Reduction
Part 3: Perform a Cluster Analysis with K-means
Part 4: Make a Recommendation

Part 1: Prepare the Data

Read myopia.csv into a Pandas DataFrame.

Remove the "MYOPIC" column from the dataset.

Standardize your dataset so that columns that contain larger values do not influence the outcome more than columns with smaller values.

Part 2: Apply Dimensionality Reduction

Perform dimensionality reduction with PCA. How did the number of the features change?

Using PCA(n_components=0.99) creates a model that will preserve approximately 99% of the explained variance, whether that means reducing the dataset to 80 principal components.

Further reduce the dataset dimensions with t-SNE and visually inspect the results. To do this, run t-SNE on the principal components, which is the output of the PCA transformation.
Create a scatter plot of the t-SNE output. Are there distinct clusters?

Part 3: Perform a Cluster Analysis with K-means

Create an elbow plot to identify the best number of clusters.

Use a for loop to determine the inertia for each k between 1 through 10.
If possible, determine where the elbow of the plot is, and at which value of k it appears.

Part 4: Make a Recommendation

Can the patients be clustered? If so, into how many clusters?

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
img		img
resources		resources
LICENSE		LICENSE
README.md		README.md
myopia_clusters_ml.ipynb		myopia_clusters_ml.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised ML - Myopia Clusters

Machine Learning Bootcamp Assignment

Background

Instructions

Part 1: Prepare the Data

Part 2: Apply Dimensionality Reduction

Part 3: Perform a Cluster Analysis with K-means

Part 4: Make a Recommendation

About

Releases

Packages

Languages

License

shohaha/ML-myopia-clusters

Folders and files

Latest commit

History

Repository files navigation

Unsupervised ML - Myopia Clusters

Machine Learning Bootcamp Assignment

Background

Instructions

Part 1: Prepare the Data

Part 2: Apply Dimensionality Reduction

Part 3: Perform a Cluster Analysis with K-means

Part 4: Make a Recommendation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages