The Search Part
- To build all compat windows under k hash functions and store it into k distinct files into compatWindows directory.
- Build index files which helps users get the position of the inverted compatWindows list under a specified ith hash function and a specified token id.
- Given a query, search near duplicate sequences stored in compatWindows files with the help of index files.
// build compat windows files (if parrelled sort is needed, g++9 and relevant library are neccessary)
g++ -ggdb3 -O3 -w -Wextra -std=c++17 -pedantic -o buildCompatWindows -ltbb -fopenmp
// build index file
g++ -O3 buildIndex
// search Duplicate
g++ -ggdb3 -O3 -w -Wextra -std=c++17 -pedantic -o searchDuplicate -ltbb -fopenmp