Install all the dependencies in the requirements.txt file.
cd setup
pip install -r requirements.txt
Or create a new virtual environment with the dependencies.
cd setup
conda env create --name <env_name> --file = config.yml
Where <env_name>
is the name of the environment.
The data used in this project is a synthetic dataset generated using our application web Synthetic Activity Generation using the metadata found on data/<user>/metada
. The data is stored in the data
directory.
The data is stored in the following format:
data
│ |── <user>
│ │ |── metadata
│ │ │ |── dictionary_rooms.json
│ │ │ |── daily_activities.json
│ │ │ |── assigned_activities.json
│ │ │ |── assigned_activities_per_year.json
│ │ │ |── assigned_activities_per_year_random.json
| | |── easy
| | | |── activities-simulation.csv
| | | |── groundtruth.png
| | | |── out_feat_extraction_quarters**
| | |── medium
| | | |── activities-simulation.csv
| | | |── groundtruth.png
| | | |── out_feat_extraction_quarters**
| | |── hard
| | | |── activities-simulation.csv
| | | |── groundtruth.png
| | | |── out_feat_extraction_quarters**
| |── <user2>
.................
| |── extend_to_year_activities.py
| |── feature_extraction.py
| |── groundtruth.csv
| |── groundtruth_formatted.xlsx
| |── Weekly Routines.pdf
For each user in the data, there are 4 directories: metadata, easy, medium, and hard. The last three correspond to different difficulty levels added during the configuration of synthetic data, and the first one has the necessary files for the synthetic data generation on the application web.
In the metadata directory, there are JSON files used by the web application to generate synthetic data. The files assigned_activities_per_year.json
and assigned_activities_per_year_random.json
are generated by the script extend_to_year_activities.py
.
Each difficulty level directory (easy, medium, hard) contains three files:
a. activities-simulation.csv: This file presents the data as a location map for a user, where each row is a day of data collection containing an array of 1440 locations, one for each minute of the day.
b. groundtruth.png: This file contains a visual representation of the location map, with each black and white dashed line representing the separation between weeks (specifically between each Sunday and Monday).
c. out_feat_extraction_quarters.csv: This file represents the feature extraction output used as input for routine detection models, generated by the script feature_extraction.py
.
Additionally, there are three more files on data/
:
a. groundtruth.csv: Contains the theoretical relative frequency of each user being in a location for each day of the week and each time interval of interest.
b. groundtruth_formatted.xlsx: The same as groundtruth.csv
but in Excel format.
c. Weekly Routines.pdf: Contains a description of the locations from each user for each day of the week, which was used to generate groundtruth.csv
.
To run the routine detection and visualization, run the following command:
python plot_routines.py
This code will save on the results
file the results of the routine execution.
The parameters, directory of data and the name of the results directory are on the config.yaml
file.
To run the frequency table extraction from the results of the routine execution, run the following command:
python evaluation.py
To run the table calculation of ROC AUC, F1 Score, Precision, Recall, ROC plots and confusion matrix from the results of the routine execution, run the following command:
python metrics.py
The program operates as follows:
-
plot_routines.py: This script runs the DRGS routine detection algorithm using parameters specified in
config.yaml
and plots the hierarchical routines graph and each cluster detected. -
evaluation.py: If you want to validate the results and calculate classification metrics for the detected hierarchical routines based on the
config.yaml
parameters, run this script first. It generates a table of relative frequencies for each difficulty level in the data. These tables show the probability of being in a room at different times of the week, based on the detected routines. -
metrics.py: Finally, to compare these frequency tables with the ground truth (found in
data/groundtruth.csv
), run this script. It extracts classification metrics, ROC curves, and confusion matrices.
In summary:
- Run
plot_routines.py
to detect routines. - Run
evaluation.py
to generate frequency tables. - Run
metrics.py
to calculate and compare metrics with the ground truth after executingevaluation.py
.