Skip to content

CDS characterization in transcripts of Eukaryote species

License

Notifications You must be signed in to change notification settings

alandurham/CodAn

This branch is 89 commits behind pedronachtigall/CodAn:master.

Repository files navigation

codan_logo

CodAn

Latest GitHub release DOI

CodAn (Coding sequence Annotator) is a computational tool designed to characterize the CDS and UTR regions on transcripts from any Eukaryote species.

Getting Started

Installation

Decompress the CodAn.tar.gz file:

tar -xf CodAn.tar.gz

Add the bin directory to your PATH:

export PATH=$PATH:path/to/CodAn/bin/

Requirements

Predictive models

The predictive models are available in the subfolder "models". The folder contains all models designed for Eukaryote species (i.e., Fungi, Plants and Animals [Invertebrates and Vertebrates]). The models were designed to be used in Full-Length or Partial transcripts.

Download the model specific to your necessities, as described at the "models" folder, decompress the model file (using unzip model.zip), and indicate the decompressed model path in the -m option.

Usage

Usage: codan.py [options]

Options:
  -h, --help            show this help message and exit
  -t file, --transcripts=file
                        Mandatory - input transcripts file (FASTA format),
                        /path/to/transcripts.fa
  -m model, --model=model
                        Mandatory - path to model, /path/to/model
  -s string, --strand=string
                        Optional - strand of sequence to predict genes (plus,
                        minus or both) [default=both]
  -c int, --cpu=int     Optional - number of threads to be used [default=1]
  -o folder, --output=folder
                        Optional - path to output folder,
                        /path/to/output/folder/ if not declared, it will be
                        created at the transcripts input folder
                        [default="CodAn_output"]
  -b proteinDB, --blastdb=proteinDB
                        Optional - path to blastDB of known protein sequences,
                        /path/to/blast/DB/DB_name
  -H int, --HSP=int     Optional - used in the "-qcov_hsp_perc" option of
                        blastx [default=80]

Basic usage (predict CDS):

codan.py -t transcripts.fa -o output_folder -m model

Alternative usage (predict CDS and perform BLAST search in specific DB to annotated predicted genes based on similarity):

codan.py -t transcripts.fa -o output_folder -m model -b blast_DB

To run this optional step, just indicate a specific protein DB mounted using the makeblastdb function from the NCBI-BLAST approach. The user can download the pre-mounted protein DBs, such as swissprot (ftp://ftp.ncbi.nlm.nih.gov/blast/db/).

Tutorial

Follow the instructions in the quick tutorial to learn how to use CodAn and interpret the results.

Reference

If you use or discuss CodAn, please cite the preprint:

Nachtigall et al. CodAn: predictive models for the characterization of mRNA transcripts in Eukaryotes

License

GNU GPLv3

Contact

To report bugs, to ask for help and to give any feedback, please contact Pedro G. Nachtigall: pedronachtigall@gmail.com

About

CDS characterization in transcripts of Eukaryote species

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%