CodAn (Coding sequence Annotator) is a computational tool designed to characterize the CDS and UTR regions on transcripts from any Eukaryote species.
Decompress the CodAn.tar.gz file:
tar -xf CodAn.tar.gz
Add the bin directory to your PATH:
export PATH=$PATH:path/to/CodAn/bin/
- Python3 and Biopython
apt-get install python3-biopython
- Perl, Bioperl and MCE (libmce-perl)
apt-get install bioperl libmce-perl
- NCBI-BLAST (v2.9.0 or above)
The predictive models are available in the subfolder "models". The folder contains all models designed for Eukaryote species (i.e., Fungi, Plants and Animals [Invertebrates and Vertebrates]). The models were designed to be used in Full-Length or Partial transcripts.
Download the model specific to your necessities, as described at the "models" folder, decompress the model file (using unzip model.zip
), and indicate the decompressed model path in the -m
option.
Usage: codan.py [options]
Options:
-h, --help show this help message and exit
-t file, --transcripts=file
Mandatory - input transcripts file (FASTA format),
/path/to/transcripts.fa
-m model, --model=model
Mandatory - path to model, /path/to/model
-s string, --strand=string
Optional - strand of sequence to predict genes (plus,
minus or both) [default=both]
-c int, --cpu=int Optional - number of threads to be used [default=1]
-o folder, --output=folder
Optional - path to output folder,
/path/to/output/folder/ if not declared, it will be
created at the transcripts input folder
[default="CodAn_output"]
-b proteinDB, --blastdb=proteinDB
Optional - path to blastDB of known protein sequences,
/path/to/blast/DB/DB_name
-H int, --HSP=int Optional - used in the "-qcov_hsp_perc" option of
blastx [default=80]
Basic usage (predict CDS):
codan.py -t transcripts.fa -o output_folder -m model
Alternative usage (predict CDS and perform BLAST search in specific DB to annotated predicted genes based on similarity):
codan.py -t transcripts.fa -o output_folder -m model -b blast_DB
To run this optional step, just indicate a specific protein DB mounted using the makeblastdb
function from the NCBI-BLAST approach.
The user can download the pre-mounted protein DBs, such as swissprot (ftp://ftp.ncbi.nlm.nih.gov/blast/db/).
Follow the instructions in the quick tutorial to learn how to use CodAn and interpret the results.
If you use or discuss CodAn, please cite the preprint:
To report bugs, to ask for help and to give any feedback, please contact Pedro G. Nachtigall: pedronachtigall@gmail.com