Implements Symbolic Aggregate Approximation of time series in Java; implements a multi-threaded SAX discretization. This code is released under GPL v.2.0.
The library is available through Maven Central and is built by TravisCI: .
An illustration of a time series of 128 points converted into the word of 8 letters:
Find more information about SAX at its authors pages: SAX overview by Jessica Lin, Eamonn Keogh's SAX page, or at sax-vsm wiki page.
[1] Lin, J., Keogh, E., Patel, P., and Lonardi, S., Finding Motifs in Time Series, The 2nd Workshop onTemporal Data Mining, the 8th ACM Int'l Conference on KDD (2002)
[2] Patel, P., Keogh, E., Lin, J., Lonardi, S., Mining Motifs in Massive Time Series Databases, In Proc. ICDM (2002)
If you are using this implementation for you academic work, please cite our Grammarviz 2.0 paper:
[Citation] Senin, P., Lin, J., Wang, X., Oates, T., Gandhi, S., Boedihardjo, A.P., Chen, C., Frankenstein, S., Lerner, M., GrammarViz 2.0: a tool for grammar-based pattern discovery in time series, ECML/PKDD Conference, 2014.
If you are interested in the time series motif and discord discovery, please check out our new tool called GrammarViz 2.0 -- based on SAX, Grammatical Inference, and algorithmic (Kolmogorv complexity) it enables variable-length time series recurrent and anomalous patterns detection.
The code is written in Java and I use maven to build it:
$ mvn package -P single
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building jmotif-sax
[INFO] task-segment: [package]
...
[INFO] Building jar: /media/Stock/git/jmotif-sax/target/jmotif-sax-0.1.1-SNAPSHOT-jar-with-dependencies.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
The jar file can be used to convert a time series (represented as a single-column text file) to SAX via sliding window in command line:
$ java -jar target/jmotif-sax-0.1.1-SNAPSHOT-jar-with-dependencies.jar
Usage: <main class> [options]
Options:
--alphabet_size, -a
SAX alphabet size, Default: 3
--data, -d
The input file name
--out, -o
The output file name
--strategy
SAX numerosity reduction strategy
Default: EXACT, Possible Values: [NONE, EXACT, MINDIST]
--threads, -t
number of threads to use, Default: 1
--threshold
SAX normalization threshold, Default: 0.01
--window_size, -w
SAX sliding window size, Default: 30
--word_size, -p
SAX PAA word size, Default: 4
When run, it prints the time series point index and a corresponding word:
$ java -jar "target/jmotif-sax-0.1.1-SNAPSHOT-jar-with-dependencies.jar" \
-d src/resources/test-data/ecg0606_1.csv -o test.txt
$ head test.txt
0, aabc
8, aacc
13, abcc
20, abcb
...
There two classes which implement sequential end-to-end workflow for SAX and a parallel implementation of the discretization. These are TSProcessor and SAXProcessor.
// instantiate needed classes
NormalAlphabet na = new NormalAlphabet();
SAXProcessor sp = new SAXProcessor();
// read the input file
double[] ts = TSProcessor.readFileColumn(dataFName, 0, 0);
// perform the discretization
String str = sp.ts2saxByChunking(ts, paaSize, na.getCuts(alphabetSize), nThreshold);
// print the output
System.out.println(str);
// instantiate needed classes
NormalAlphabet na = new NormalAlphabet();
SAXProcessor sp = new SAXProcessor();
// read the input file
double[] ts = TSProcessor.readFileColumn(dataFName, 0, 0);
// perform the discretization
SAXRecords res = sp.ts2saxViaWindow(ts, slidingWindowSize, paaSize,
na.getCuts(alphabetSize), nrStrategy, nThreshold);
// print the output
Set<Integer> index = res.getIndexes();
for (Integer idx : index) {
System.out.println(idx + ", " + String.valueOf(res.getByIndex(idx).getPayload()));
}
// instantiate needed classes
NormalAlphabet na = new NormalAlphabet();
SAXProcessor sp = new SAXProcessor();
// read the input file
double[] ts = TSProcessor.readFileColumn(dataFName, 0, 0);
// perform the discretization using 8 threads
ParallelSAXImplementation ps = new ParallelSAXImplementation();
SAXRecords res = ps.process(ts, 8, slidingWindowSize, paaSize, alphabetSize,
nrStrategy, nThreshold);
// print the output
Set<Integer> index = res.getIndexes();
for (Integer idx : index) {
System.out.println(idx + ", " + String.valueOf(res.getByIndex(idx).getPayload()));
}
The plot below shows the speedup achieved when using the parallelized SAX version on the dataset 300_signal1.txt
of length 536,976 points. Parameters used in the experiment: sliding window size 200, PAA size 11, alphabet size 7, and three different NR strategies.
Class SAXRecords implements a method for getting the most frequent SAX words:
Alphabet na = new NormalAlphabet();
double[] series = TSProcessor.readFileColumn(DATA_FNAME, 0, 0);
SAXProcessor sp = new SAXProcessor();
saxData = sp.ts2saxViaWindow(series, WIN_SIZE, PAA_SIZE, na.getCuts(ALPHABET_SIZE),
NR_STRATEGY, NORM_THRESHOLD);
ArrayList<SAXRecord> motifs = saxData.getMotifs(10);
SAXRecord topMotif = motifs.get(0);
System.out.println("top motif " + String.valueOf(topMotif.getPayload()) + " seen " +
topMotif.getIndexes().size() + " times.");
Class HOTSAXImplementation implements two methods for rare patterns discovery:
Alphabet na = new NormalAlphabet();
double[] series = TSProcessor.readFileColumn(DATA_FNAME, 0, 0);
discordsTrie = HOTSAXImplementation.series2Discords(series, WIN_SIZE, ALPHABET_SIZE,
DISCORDS_TO_REPORT, new LargeWindowAlgorithm(), NORM_THRESHOLD);
System.out.println("The best discord: " + discordsTrie.get(0));
discordsHash = HOTSAXImplementation.series2DiscordsWithHash(series, WIN_SIZE, PAA_SIZE,
ALPHABET_SIZE, DISCORDS_TO_REPORT, new LargeWindowAlgorithm(), NORM_THRESHOLD);
System.out.println("The best discord: " + discordsHash.get(0));