Below are the steps that I have thought about when looking at the Myopia dataset. My overall goal is to go through a variety of machine learning methods in the process of determining a good quality model.
The most general description of my process is that I am looking for the best combination of Data Preparation, Learning Algorithm and Hyperparameters that make a representative model.
My Process:
- Start with a Goal
- Data Search
- Exploratory Data Analysis
- Analyze the Ask
- Set Priorities and Limits
- Select Intial Process (Model)
- Re-evaluate the Goal and Value
- Preprocess Data for Algorithm
- Remove Unnecessary Features
- Remove Outliers
- Replace Missing Values
- Balance Data
- One-hot-encode Feature Classes
- Label-encode Target Classes
- Transform Data (for parametric algorithms)
- Scale Data (distance or gradient descent or regularized algorithms)
- Run Model
- Evaluate Models
- Classification Reports
- ROC AUC or Precision-Recall AUC
- Adjust Models
- Changing Theshold Effects
- Tuning Parameters
- Remove Features
- Select Best Model
- Re-evaluate the Ask?