This project explores classification techniques in data mining to model the taste preferences for white wine based on easily accessible analytical data during any wine's certification phase. The best-performing model in our analysis was the K-Nearest Neighbors (K-NN) trained on a balanced set, showing a commendable balance between sensitivity and specificity.
The dataset titled "White Wine Quality" consists of 4989 observations of Vinho Verde wine variant physical-chemical characteristics from 2004 to 2007, including median taste preference scores from blind tasting sessions.
- Preliminary data analysis for missing values, outliers, and variable distribution.
- Classification models like K-Nearest Neighbors (K-NN), Logistic Regression, and Discriminant Analysis.
- Downsampling technique to address class imbalance.
- Performance evaluation using metrics like Accuracy, AUC, and Balanced Accuracy.
Our models achieved varying degrees of success, with the balanced K-NN model performing best in terms of balanced accuracy, indicating a successful classification of white wine quality.
- Ensure R is installed on your system.
- Clone this repository.
- Run the
code.R
script to perform the analysis.
- Julius Maliwat
- Filippo Bianchini
- Giacomo Rabuzzi
- Andrea Robbiani