-
The program has been updated and moved to PhaBOX 2, which is more user-friendly. Hope you will enjoy it. This folder will be no longer maintained.
-
Our web server for viruses-related tasks (including virus identification, taxonomy classification, lifestyle prediction, host prediction, and protein annotation) is available! You can visit Web Server to use the GUI. We also provided more detailed intermediate files and visualization for further analysis.
PhaTYP is a python library for bacteriophages' lifestyle prediction. PhaTYP is a BERT-based model and rely on protein-based vocabulary to convert DNA sequences into sentences for prediction.
The main function of PhaTYP is to predict the lifestyles (virulent or temperate) of phage-like contigs. The input of the program should be fasta files and the output will be a csv file showing the predictions. Since it is a Deep learning model, if you have GPU units on your PC, we recommand you to use them to save your time.
If you have any trouble installing or using PhaTYP, please let us know by emailing us (jyshang2-c@my.cityu.edu.hk).
Note: we suggest you to install all the package using conda (both miniconda and Anaconda are ok).
After cloning this respository, you can use anaconda to install the phatyp.yaml. This will install all packages you need with gpu mode (make sure you have installed cuda on your system to use the gpu version. Othervise, it will run with cpu version). The command is: conda env create -f phatyp.yaml -n phatyp
If you want to use the gpu to accelerate the program:
-
cuda
-
Pytorch-gpu
-
For cpu version pytorch:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
-
For gpu version pytorch: Search pytorch to find the correct cuda version according to your computer
Due to the limited size of the GitHub, we zip the database. Before using PhaTYP, you need to unpack them using the following commands.
- When you use PhaTYP at the first time
cd PhaTYP/
conda env create -f phatyp.yaml -n phatyp
conda activate phatyp
fileid="1tsUArctGf9Fd3xa-0sEcp6ykwxTy9uxG"
filename="model.zip"
html=`curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}"`
curl -Lb ./cookie "https://drive.google.com/uc?export=download&`echo ${html}|grep -Po '(confirm=[a-zA-Z0-9\-_]+)'`&id=${fileid}" -o ${filename}
unzip model.zip
pip install .
Note: Please check whether the pytorch_model.bin is larger than 200M before using PhaTYP
- Because the parameter is larger than 200M, we cannot upload it to GitHub directly. Please make sure you have downloaded model.zip correctly.
- if you cannot download the model.zip from the command lines above, please use the Google Drive link to download it and place it under the PhaTYP/ root folder.
- If the example can be run without any bugs, you only need to activate your 'phatyp' environment before using PhaTYP.
conda activate phatyp
python preprocessing.py [--contigs INPUT_FA] [--len MINIMUM_LEN] [--midfolder DIR]
python PhaTYP.py [--out OUTPUT_CSV] [--midfolder DIR]
Options
--contigs INPUT_FA
input fasta file
--len MINIMUM_LEN
predict only for sequence >= len bp (default 3000)
--out OUTPUT_CSV
The output csv file (prediction)
--midfolder DIR
Folder to store the intermediate files (default phatyp/)
Example
Prediction on the example file:
python preprocessing.py --contigs test_contigs.fa
python PhaTYP.py --out example_prediction.csv
The prediction will be written in example_prediction.csv. The CSV file has three columns: contigs names, prediction, and prediction score. The test_contig.fasta contain a phage genome, so the output is phage.
Detailed information can be found in 'train/' folder
We added a parameter called 'prodigal' for you to use different versions of the prodigal. You can use the path of your prodigal. Or if you have added your prodigal into your env path, you can use the path's name.
python preprocessing.py --contigs test_contigs.fa --prodigal prodigal-gv
OR
python preprocessing.py --contigs test_contigs.fa --prodigal /path/to/prodigal/prodigal-gv
PhaTYP was accpeted by Briefings in Bioinformatcs: PhaTYP: Predicting lifestyle for bacteriophages using BERT
Jiayu Shang, Xubo Tang, Yanni Sun, PhaTYP: predicting the lifestyle for bacteriophages using BERT, Briefings in Bioinformatics, 2022;, bbac487, https://doi.org/10.1093/bib/bbac487
The arXiv version can be found via: PhaTYP: Predicting lifestyle for bacteriophages using BERT
If you have any questions, please email us: jyshang2-c@my.cityu.edu.hk