Data Augmentation for High Dimensional Multivariate Time-Series Data Using Generative Adversarial Networks (GANs)

Description

This work covers the generation and evaluation of synthetic human activity data generated by GANs. The overall aim is to generate realistic, synthetic data that can be used to improve classification perfomance by extending the original dataset.
The generation pipeline takes real-world data as input and produces ten times as much synthetic data for each activity. This process is depicted in more detail in the following image:

The evaluation of the generated data is done in four ways:

Visualize how well the distributions of each activity resemble the original ones using PCA and t-SNE
Apply MMD as a sample-based metric to analyze the similarity of the distributions
Use TSTR/TRTS to evaluate the ability of the synthetic data to be used as substitute for real-world data
Mix real and synthetic data with the aim to improve classification performance

Datasets

Three different datasets are used as benchmarks, two of which were recorded in the course of this work.

PAMAP2 contains simple activities of daily living
SONAR/SONAR-LAB contain a variety of complex nursing activities with a high number of sensor channels
- Link to datasets will be added once published

The pipelines can be extended to include further HAR datasets, provided that they can be integrated into the Recording structure.

Getting Started

Dependencies

All requirements are listed in requirements.txt. Use the following command to install all dependencies automatically:

pip install -r requirements.txt

Code Explanation

The src folder contains the following directories / files:

runner.py
- Entry point to program execution. See Executing program.
datatypes/
- Contains basic data types Recording and Window that are used to handle different datasets consitently.
evaluation/
- Metrics and utility functions for evalution.
execute/
- Contains the actual pipelines that are being executed by runner.py.
- Each dataset has a pipeline for generating synthetic data and one for evalution.
loader/
- Functions used to read datasets and to fit them into the Recording structure.
- Preprocessing functions
models/
- Contains TensorFlow models.
scripts/ and visualization/
- Scripts to visualize and analyze the datasets.
TimeGAN/
- Contains a modified TimeGAN framework which is used to generate synthetic data. See Acknowledgments.
utils/
- Utility functions for reading, windowing and processing the data
- settings.py stores dataset specific constants
labels.json (and similar)
- Contain all activities performed in SONAR/SONAR-LAB

Executing Program

Run runner.py with the following options:

--dataset {pamap2,sonar,sonar_lab}: Dataset to use
--mode {gen,eval}: Pipeline to use (generation or evaluation)
--data_path DATA_PATH: Path to the dataset directory
--synth_data_path SYNTH_DATA_PATH: Path to directory where the generated data is stored (used for evaluation only)
--random_data_path RANDOM_DATA_PATH: Path to random data file (used for evaluation only)
--window_size WINDOW_SIZE: Window size
--stride_size STRIDE_SIZE: Stride size

Example command

$ python3 runner.py --dataset sonar_lab --mode eval --data_path PATH_TO_DATASET 
--synth_data_path PATH_TO_SYNTHETIC_DATA --random_data_path PATH_TO_RANDOM_DATA_FILE 
--window_size 300 --stride_size 300 > output.txt

Note: To run only some of the evaluations, the flags in the according evaluation pipelines have to be set manually.

Some Results

PCA visualization	t-SNE visualization

Acknowledgments

TimeGAN: https://github.com/jsyoon0823/TimeGAN
RBF calculation: https://github.com/jindongwang/transferlearning/blob/master/code/distance/mmd_numpy_sklearn.py
DeepConvLSTM implementation: https://github.com/AniMahajan20/DeepConvLSTM-NNFL/blob/master/DeepConvLSTM.ipynb
Project's repository: https://github.com/Sensors-in-Paradise

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
.idea		.idea
.vscode		.vscode
images		images
src		src
.env.template		.env.template
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Augmentation for High Dimensional Multivariate Time-Series Data Using Generative Adversarial Networks (GANs)

Description

Datasets

Getting Started

Dependencies

Code Explanation

Executing Program

Some Results

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

kpostnov/nursing-data-augmentation

Folders and files

Latest commit

History

Repository files navigation

Data Augmentation for High Dimensional Multivariate Time-Series Data Using Generative Adversarial Networks (GANs)

Description

Datasets

Getting Started

Dependencies

Code Explanation

Executing Program

Some Results

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages