Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 928 Bytes

README.md

File metadata and controls

22 lines (16 loc) · 928 Bytes

MammogramDiagnosis

Used the "mammographic masses" public dataset from the UCI repository (source: https://archive.ics.uci.edu/ml/datasets/Mammographic+Mass)

This data contains 961 instances of masses detected in mammograms, and contains the following attributes:

  1. BI-RADS assessment: 1 to 5 (ordinal)
  2. Age: patient's age in years (integer)
  3. Shape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal)
  4. Margin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5 (nominal)
  5. Density: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal)
  6. Severity: benign=0 or malignant=1 (binominal)

Applied several different supervised machine learning techniques to this data set, and see which one yields the highest accuracy as measured with K-Fold cross validation (K=10).

  • Decision tree
  • Random forest
  • KNN
  • Naive Bayes
  • SVM
  • Logistic Regression