Implementation of "Modeling Time-Evolving Causality over Data Streams," Naoki Chihara, Yasuko Matsubara, Ren Fujiwara, and Yasushi Sakurai. The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD2025 (to appear).
We focus on causal relationships that evolve over time in data streams and refer such relationships as "time-evolving causality." We presented ModePlait, which aims to discover time-evolving causalities in multivariate co-evolving data streams, and forecast future values in a stream fashion simultaneously. The overview of our proposed model is following:
The following preview of our results shows the effectiveness of ModePlait over an epidemiological data stream. We would refer you to our paper for more details of these results and proposed methods.
This source code is tested with the following dependencies:
- Python == 3.9.15
- numpy == 1.23.5
- pandas == 1.5.3
- matplotlib == 3.8.2
- scikit-learn == 1.1.3
- scipy == 1.11.4
- cdt == 0.6.0
Note
Note that some functions in the cdt load and execute R packages. So, you need to set it up according to this official document.
-
Clone this repository.
git clone https://github.com/C-Naoki/ModePlait.git
-
Construct a virtual environment and install the required packages.
make install
- Note that it requires to pyenv and poetry.
- If you prefer not to use them, you can also use
requirements.txt
created based on pyproject.toml.
Specifically, the above command performs the following steps:
- if necessary, install Python 3.9.15 using pyenv, and then switch to this version.
- tell poetry to use python 3.9.15.
- install packages in
pyproject.toml
. - attach the path file (i.e.,
*.pth
) in thesite-packages/
for extending module search path.
Please check the
Makefile
for more details. -
Run quick demos of ModePlait
sh bin/google.sh
If you want the command to continue running after logging out, you prepare
nohup/
directory and use-n
option as shown below (using nohup).mkdir nohup sh bin/google.sh -n
- The execution log is saved in
nohup/
directory.
- The execution log is saved in
- All datasets except
1. covid19
are placed in the folder./data
- If you execute the command
sh bin/covid19.sh
, the1. covid19
is automatically downloaded from Google COVID-19 Open Data Repository and saved in the folder./data
.
We compared our algorithm with the following seven state-of-the-art baselines for causal discovering, namely CASPER, DARING, NoCurl, NOTEARS-MLP (NO-MLP), NOTEARS, LiNGAM, and GES. We also compared with the following five leading competitors in time series forecasting, namely TimesNet, PatchTST, DeepAR, OrbitMap, and ARIMA.
We ran experiments on synthetic datasets with multiple temporal sequences to encompass various types of scenarios and ModePlait outperformed all competitors for every setting.
ModePlait achieved a high forecasting accuracy for every dataset, including both synthetic and real-world datasets.
We can see that discovering the time-evolving causality adaptively is very helpful when forecasting in a streaming fashion.
We conducted all above experiments on an Intel Xeon Platinum 8268 2.9GHz quad core CPU with 512GB of memory and running Linux.
If you use this code for your research, please consider citing our paper.