A step-by-step tutorial on installation.
Note: We are working to making DeepMicrobes available on Bioconda, so that this tutorial may be frequently updated.
git clone https://github.com/MicrobeLab/DeepMicrobes.git
(Optional) Please install conda first.
Note: We build and test DeepMicrobes on TensorFlow 1.9.0 and the compatibility to other versions is still being tested.
conda env create -f DeepMicrobes/install.yml
# activate it
source activate DeepMicrobes
(Optional) Install seq-shuf
(only for shuffling sequences in a training set)
seq-shuf
is available at https://github.com/thackl/seq-scripts
To facilitate the installation, we'd included the script in bin
:
export PATH=/path/to/DeepMicrobes/bin:$PATH
(Optional) Install GNU Parallel
(for acceleration of TFRecord conversion)
The parallel
is also included in bin
. Alternatively, you could install it using the command:
(wget -O - pi.dk/3 || curl pi.dk/3/) | bash
A vocabulary file of k-mers is required for TFRecord conversion. To download the 12-mer vocabulary:
wget https://github.com/MicrobeLab/DeepMicrobes-data/raw/master/vocabulary/tokens_merged_12mers.txt.gz
gunzip tokens_merged_12mers.txt.gz
Although the vocabularies of other k-mers are not used in DeepMicrobes (except for the k-mer variant models), they could be useful when training a custom model.
wget https://github.com/MicrobeLab/DeepMicrobes-data/raw/master/vocabulary/tokens_merged_11mers.txt.gz
wget https://github.com/MicrobeLab/DeepMicrobes-data/raw/master/vocabulary/tokens_merged_10mers.txt.gz
wget https://github.com/MicrobeLab/DeepMicrobes-data/raw/master/vocabulary/tokens_merged_9mers.txt.gz
wget https://github.com/MicrobeLab/DeepMicrobes-data/raw/master/vocabulary/tokens_merged_8mers.txt.gz
gunzip tokens_merged_11mers.txt.gz
gunzip tokens_merged_10mers.txt.gz
gunzip tokens_merged_9mers.txt.gz
gunzip tokens_merged_8mers.txt.gz
The vocabulary file can be stored in any dictionary (hereafter referred to as /path/to/vocab/
).
The pipelines
dictionary contains wrapper shell scripts for TFRecord conversion, model training, and classification.
export PATH=/path/to/DeepMicrobes/pipelines:$PATH
export PATH=/path/to/DeepMicrobes/scripts:$PATH
export PATH=/path/to/DeepMicrobes:$PATH