First its necessary to install Conda in order to execute the code.
- Install libs on new environment:
conda env create --file environment.yml
- Activate the environment:
conda activate EQUINOTHERAPY_PILOT_SLEEP
- Execute whole project by running the main file:
python src/main.py
In case environment.yml does not work, here are the specific versions of the libraries used in the project:
- Python: 3.11.5
- Luigi: 3.5.1
- Scikit-learn: 1.2.2
- XGBoost: 2.0.3
- Catboost: 1.2.5
- Tensorflow: 2.16.1
- Keras: 3.4.0
- Numpy: 1.26.4
- Pandas: 2.2.2
- TQDM: 4.66.4
- Imbalanced-Learn: 0.12.3
- Seaborn: 0.13.2
Note: Some libraries are OS-specific. For instance, on macOS,
tensorflow-macos
andtensorflow-metal
are required to enable GPU use.
Phase | Description | Script |
---|---|---|
Consolidation | The data obtained from the watches are matched with the stages obtained from polysomnography (PSG). | consolidation.py |
Cleaning | Two filters are applied to clean the data: - Time Filter: Only data between 20:00 and 12:00 is collected. - HRR Filter: Only windows of w_size and w_overlap are collected. Heart rate recovery is calculated in each window and if it exceeds the threshold, the window is not included in the final file. |
cleaning.py |
Scaling | The data is scaled to improve model performance. First, an adjustment of the median of the individual data for each patient is performed, then a global 'RobustScaler' is applied to the entire dataset. | scaling.py |
Preprocessing | Feature extraction is performed (see features.py for details). Then, the feature matrix of each patient is normalized using a StandardScaler (Z-score normalization). | preprocessing.py |
Phase | Description | Script |
---|---|---|
Partitioning | Two validation methods are used:
|
partitioning.py |
Oversampling | To achieve a balanced dataset, the SMOTE algorithm is applied to the training set of both validation methods. | oversampling.py |
Training | A list of models is trained using the training set of both validation methods. These models are defined in models.py. Parameter optimization is performed for each partition (fold or participant), and only the best model of each partition is saved. | training.py |
Analysis | Various analysis approaches are conducted in this phase:
|
analysis.py |
Phase | Description | Script |
---|---|---|
Partitioning | Using a combination of the aforementioned validation methods, Leave-One-Participant-Out is used to divide the dataset into training and testing sets, then a Stratified KFold with n_splits (see luigi.cfg to check value) is used on the training set, creating the training and validation folds. | partitioning.py |
Training | Using a DataGenerator, the training and validation sets are fed to the model. This model might be one of three different versions (as seen in lstm_creation.py), the selection is specified in luigi.cfg, as well as the mode to train the model, which consists of the list of inputs which will be passed to the model. These inputs might be:
|
training.py |
Analysis | Previously trained models are evaluated using the testing set. Then, Confussion Matrix are obtained. | analysis.py |
Round | Patients used | N Phases | Description |
---|---|---|---|
1st | All Patients | 5 | Window Gaussian normalization, Individuals only, MinMaxScaler (0,1) on feature matrix, no Oversampling |
2nd | All Patients | 5 | Mean adjust, Global RobustScaler, MinMaxScaler (0,1) on feature matrix, Feature selection (ANOVA), Smote Oversampling |
3rd | All Patients | 5 | Mean adjust, Global RobustScaler, StandardScaler (Gaussian) on feature matrix, Feature selection (ANOVA), Smote Oversampling |
4th | HQ Patients | 5 | Mean adjust, Global RobustScaler, StandardScaler (Gaussian) on feature matrix, Feature selection (ANOVA), Smote Oversampling |
5th | MLQ Patients | 5 | Mean adjust, Global RobustScaler, StandardScaler (Gaussian) on feature matrix, Feature selection (ANOVA), Smote Oversampling |
6th | All Patients | 2 | Mean adjust, Global RobustScaler, StandardScaler (Gaussian) on feature matrix, Feature selection (ANOVA), Smote Oversampling |
Round | Description |
---|---|
First | 3DACC, HR and MAGACC with LSTM_1 |
Second | 3DACC, HR with LSTM_1 |