Skip to content

A framework for variable-length time series classification (VSTC) with local signal interpretability.

License

Notifications You must be signed in to change notification settings

xmootoo/sss-official

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Stochastic Sparse Sampling (SSS)

Overview

This is the official repository for the paper:

Xavier Mootoo, Alan A. DΓ­az-Montiel, Milad Lankarany, Hina Tabassum. Stochastic Sparse Sampling: A Framework for Variable-Length Medical Time Series Classification.

An preliminary version has been accepted to the NeurIPS 2024 Workshop on Time Series in the Age of Large Models, with the full version under review.

Description

Stochastic Sparse Sampling (SSS) is a novel time series classification method for processing variable-length sequences. SSS outperforms many state-of-the-art machine learning and deep learning methods, benchmarked on the Epilepsy iEEG Multicenter Dataset for seizure onset zone (SOZ) localization.

SSS Framework

Table of Contents

Installation

Dependencies

  • Python $\geq$ 3.10
  • Additional dependencies listed in requirements.txt

Using conda (recommended)

# Create and activate conda environment
conda create -n sss python=3.10
conda activate sss

# Install requirements
pip install -r requirements.txt
pip install -e .

Using pip

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install requirements
pip install -r requirements.txt
pip install -e .

Project Structure

project_root/
β”‚
β”œβ”€β”€ sss/                    # Main package directory
β”‚   β”œβ”€β”€ analysis/          # Visualization and analysis tools
β”‚   β”‚   └── soz/     # SOZ heatmap visualization
β”‚   β”‚
β”‚   β”œβ”€β”€ config/           # Pydantic configuration classes
β”‚   β”‚
β”‚   β”œβ”€β”€ exp/              # Experiment class
β”‚   β”‚
β”‚   β”œβ”€β”€ jobs/             # Job configurations
β”‚   β”‚
β”‚   β”œβ”€β”€ layers/           # Layers and model components
β”‚   β”‚
β”‚   β”œβ”€β”€ models/          # Machine learning full models
β”‚   β”‚
β”‚   β”œβ”€β”€ tuning/          # Hyperparameter optimization
β”‚   β”‚
β”‚   └── utils/           # Helper functions, preprocessing, logging
β”‚
β”œβ”€β”€ data/                # Dataset files
β”œβ”€β”€ download_data.sh     # Data download script
β”œβ”€β”€ main.py             # Main execution script
β”œβ”€β”€ setup.py            # Package installation script
β”œβ”€β”€ requirements.txt    # Python dependencies
└── README.md          # Documentation

Data

To download the required datasets, run:

chmod u+x download_data.sh
./download_data.sh

Usage

Running Experiments

Execute the main script with the desired model:

python main.py <model>

Available models:

  • sss: Stochastic Sparse Sampling
  • Finite Context Models:
    • finite-context/dlinear: DLinear
    • finite-context/patchtst: PatchTST
    • finite-context/timesnet: TimesNet
    • finite-context/moderntcn: ModernTCN
  • Infinite Context Models:
    • infinite-context/mamba: Mamba
    • infinite-context/gru: GRUs
    • infinite-context/lstm: LSTMs
    • infinite-context/rocket: ROCKET

Results are saved in the logs folder. For Distributed Data Parallel (DDP) or other configurations, modify sss/jobs/exp/<model>/args.yaml.

Method

While the majority of time series classification research has focused on modeling fixed-length sequences, variable-length time series classification (VTSC) remains critical in healthcare, where sequence length may vary among patients and events. To address this challenge, we propose $\textbf{S}\text{tochastic } \textbf{S}\text{parse } \textbf{S}\text{ampling}$, a novel VTSC framework developed for medical time series. SSS manages variable-length sequences by sparsely sampling fixed windows to compute local predictions, which are then aggregated and calibrated to form a global prediction. We apply SSS to the task of seizure onset zone (SOZ) localization, a critical VTSC problem requiring identification of seizure-inducing brain regions from variable-length electrophysiological time series. We evaluate our method on the Epilepsy iEEG Multicenter Dataset, a heterogeneous collection of intracranial electroencephalography (iEEG) recordings obtained from four independent medical centers. SSS demonstrates superior performance compared to state-of-the-art (SOTA) baselines across most medical centers, and superior performance on all out-of-distribution (OOD) unseen medical centers. Additionally, SSS naturally provides post-hoc insights into local signal characteristics related to the SOZ, by visualizing temporally averaged local predictions throughout the signal.

Algorithm

The SSS training algorithm is outlined below:

SSS Training Algorithm

Visualization

The project includes a visualization tool (sss/analysis/soz/visualize.py) for generating SOZ (Seizure Onset Zone) heatmap analysis of trained models. This tool integrates with Neptune.ai for model management and visualization.

Prerequisites

  1. Neptune.ai account and API token
  2. Trained model saved to Neptune (via main.py)
  3. Neptune run ID for the model you want to analyze

Setup

  1. Set your Neptune API token as an environment variable:
export NEPTUNE_API_TOKEN='your-neptune-api-token'
  1. Modify the sss/analysis/soz/plot.yaml configuration file. Example:
run_id: "SOZ-33"              # Your Neptune run ID
project_name: "your-project"  # Your Neptune project name
mode: "train"                # Dataset to visualize
...

For a complete list of configuration options and their descriptions, refer to the Config class in visualize.py.

Running the Visualization

cd sss/analysis/soz
python visualize.py

This will load the model from Neptune and generate SOZ heatmap that will be saved.

Examples

SSS Training Algorithm

SSS Training Algorithm

SSS Training Algorithm

SSS Training Algorithm

Dataset Description

The Epilepsy iEEG Multicenter Dataset consists of iEEG signals with SOZ clinical annotations from four medical centers including the Johns Hopkins Hospital (JHH), the National Institute of Health (NIH), University of Maryland Medical Center (UMMC), and University of Miami Jackson Memorial Hospital (UMH). Since UMH contained only a single patient with clinical SOZ annotations, we did not consider it in our main evaluations; however, we did use UMH within the multicenter evaluation in training set for both the all cluster evaluation, and out-of-distribution (OOD) experiments for SOZ localization on unseen medical centers.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citations

If you use this code in your research or work, please cite our paper:

@article{mootoo2024stochastic,
    title     = {Stochastic Sparse Sampling: A Framework for Variable-Length Medical Time Series Classification},
    author    = {Mootoo, Xavier and D\'{i}az-Montiel, Alan A. and Lankarany, Milad and Tabassum, Hina},
    journal   = {arXiv preprint arXiv:2410.06412},
    year      = {2024},
    url       = {https://arxiv.org/abs/2410.06412},
    eprint    = {2410.06412},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

Contact

For queries, please contact the corresponding author through: xmootoo at gmail dot com.

Acknowledgments

Xavier Mootoo is supported by Canada Graduate Scholarships - Master's (CGS-M) funded by the Natural Sciences and Engineering Research Council (NSERC) of Canada, the Vector Scholarship in Artificial Intelligence, provided through the Vector Institute, Canada, and the Ontario Graduate Scholarship (OGS) granted by the provincial government of Ontario, Canada.

We extend our gratitude to Commune AI for generously providing the computational resources needed to carry out our experiments, in particular, we thank Luca Vivona (@LVivona) and Sal Vivona (@salvivona). Many thanks as well to Anastasios Angelopoulos (@aangelopoulos) and Daniele Grattarola (@danielegrattarola) for their valuable feedback and comments on our work.

Releases

No releases published

Packages

No packages published