The goal of the project is to predict human activities (Lying, Walking stairs,...) based on data collected from various sensors (heart bpm, gyroscopes, magnetometers,...). The data is collected as time series of 512 values, recorded during 5 seconds intervals.
The project is proposed as a competition between students. The students are invited to submit their prediction on 3500 new examples, from which an accuracy score is computed on a subset of 10% (secret and fixed for everyone) of the testing set.
The challenges of the project are the missing values in the learning set and the use of time series to feed the machine learning algorithms methods.
Our focus was on: decision trees, support vector machines, multi layer perceptrons, random forests, gradient boosting methods (XGBoost) and k-nearest neighbours and their combination using the stacking method. We first analysed the data, processed it, tried the models without tuning then tuned them to see how far they could go and finally selected the best ones for the submissions.
Complete description of the method can be found in the report.
We experimented with multiple models using grid search and cross validation.
The one we chose to submit for the final grading was a random forest model with max_depth = 10 and max_features = 7.
The specific choice of the hyperparameters is based on a cross validation of which here are the results:
- Final accuracy on the public test set: 91.25%
- Final accuracy on the private test set: 88.42%
- Final grade: 16/20
- data_analysis.py: Analysis of the sensors data.
- models_training.py: Features extraction and selection, models training and evaluation.
- Simon Gardier
- Camille Trinh