This is the official repository for the paper:
An preliminary version has been accepted to the NeurIPS 2024 Workshop on Time Series in the Age of Large Models, with the full version under review.
Stochastic Sparse Sampling (SSS) is a novel time series classification method for processing variable-length sequences. SSS outperforms many state-of-the-art machine learning and deep learning methods, benchmarked on the Epilepsy iEEG Multicenter Dataset for seizure onset zone (SOZ) localization.
- Overview
- Installation
- Project Structure
- Data
- Usage
- Method
- Visualization
- Dataset Description
- License
- Citations
- Contact
- Acknowledgments
- Python
$\geq$ 3.10 - Additional dependencies listed in
requirements.txt
# Create and activate conda environment
conda create -n sss python=3.10
conda activate sss
# Install requirements
pip install -r requirements.txt
pip install -e .
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install requirements
pip install -r requirements.txt
pip install -e .
project_root/
β
βββ sss/ # Main package directory
β βββ analysis/ # Visualization and analysis tools
β β βββ soz/ # SOZ heatmap visualization
β β
β βββ config/ # Pydantic configuration classes
β β
β βββ exp/ # Experiment class
β β
β βββ jobs/ # Job configurations
β β
β βββ layers/ # Layers and model components
β β
β βββ models/ # Machine learning full models
β β
β βββ tuning/ # Hyperparameter optimization
β β
β βββ utils/ # Helper functions, preprocessing, logging
β
βββ data/ # Dataset files
βββ download_data.sh # Data download script
βββ main.py # Main execution script
βββ setup.py # Package installation script
βββ requirements.txt # Python dependencies
βββ README.md # Documentation
To download the required datasets, run:
chmod u+x download_data.sh
./download_data.sh
Execute the main script with the desired model:
python main.py <model>
Available models:
sss
: Stochastic Sparse Sampling- Finite Context Models:
finite-context/dlinear
: DLinearfinite-context/patchtst
: PatchTSTfinite-context/timesnet
: TimesNetfinite-context/moderntcn
: ModernTCN
- Infinite Context Models:
infinite-context/mamba
: Mambainfinite-context/gru
: GRUsinfinite-context/lstm
: LSTMsinfinite-context/rocket
: ROCKET
Results are saved in the logs
folder. For Distributed Data Parallel (DDP) or other configurations, modify sss/jobs/exp/<model>/args.yaml
.
While the majority of time series classification research has focused on modeling fixed-length sequences, variable-length time series classification (VTSC) remains critical in healthcare, where sequence length may vary among patients and events. To address this challenge, we propose
The SSS training algorithm is outlined below:
The project includes a visualization tool (sss/analysis/soz/visualize.py
) for generating SOZ (Seizure Onset Zone) heatmap analysis of trained models. This tool integrates with Neptune.ai for model management and visualization.
- Neptune.ai account and API token
- Trained model saved to Neptune (via
main.py
) - Neptune run ID for the model you want to analyze
- Set your Neptune API token as an environment variable:
export NEPTUNE_API_TOKEN='your-neptune-api-token'
- Modify the
sss/analysis/soz/plot.yaml
configuration file. Example:
run_id: "SOZ-33" # Your Neptune run ID
project_name: "your-project" # Your Neptune project name
mode: "train" # Dataset to visualize
...
For a complete list of configuration options and their descriptions, refer to the Config
class in visualize.py
.
cd sss/analysis/soz
python visualize.py
This will load the model from Neptune and generate SOZ heatmap that will be saved.
The Epilepsy iEEG Multicenter Dataset consists of iEEG signals with SOZ clinical annotations from four medical centers including the Johns Hopkins Hospital (JHH), the National Institute of Health (NIH), University of Maryland Medical Center (UMMC), and University of Miami Jackson Memorial Hospital (UMH). Since UMH contained only a single patient with clinical SOZ annotations, we did not consider it in our main evaluations; however, we did use UMH within the multicenter evaluation in training set for both the all cluster evaluation, and out-of-distribution (OOD) experiments for SOZ localization on unseen medical centers.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this code in your research or work, please cite our paper:
@article{mootoo2024stochastic,
title = {Stochastic Sparse Sampling: A Framework for Variable-Length Medical Time Series Classification},
author = {Mootoo, Xavier and D\'{i}az-Montiel, Alan A. and Lankarany, Milad and Tabassum, Hina},
journal = {arXiv preprint arXiv:2410.06412},
year = {2024},
url = {https://arxiv.org/abs/2410.06412},
eprint = {2410.06412},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}
For queries, please contact the corresponding author through: xmootoo at gmail dot com
.
Xavier Mootoo is supported by Canada Graduate Scholarships - Master's (CGS-M) funded by the Natural Sciences and Engineering Research Council (NSERC) of Canada, the Vector Scholarship in Artificial Intelligence, provided through the Vector Institute, Canada, and the Ontario Graduate Scholarship (OGS) granted by the provincial government of Ontario, Canada.
We extend our gratitude to Commune AI for generously providing the computational resources needed to carry out our experiments, in particular, we thank Luca Vivona (@LVivona) and Sal Vivona (@salvivona). Many thanks as well to Anastasios Angelopoulos (@aangelopoulos) and Daniele Grattarola (@danielegrattarola) for their valuable feedback and comments on our work.