Output

Mantis Output

This tool generates 3 tab-separated output files (along with HMMER's default output), the output_annotation.tsv,integrated_annotation.tsv, and consensus_annotation.tsv. Keep in mind Mantis only outputs sequences for which at least one homolog was found.

`output_annotation.tsv`

The output_annotation.tsv will look something like this:

Query	Ref_file	Ref_hit	Ref_hit_accession	evalue	bitscore	Direction	Query_length	Query_hit_start	Query_hit_end	Ref_hit_start	Ref_hit_end	Ref_length
Query_1	ref_file_1	ref_hit_1	ref_hit_1_accession	2.1e-46	5520	Forward/Reverse	154	64	90	85	105	180

The output_annotation.tsv is not very informative when we need to connect our annotations to metadata, that's why I've included integrated_annotation.tsv.

`integrated_annotation.tsv`

The integrated_annotation.tsv will look something like this:

Query	Ref_file	Ref_hit	Ref_hit_accession	evalue	bitscore	Direction	Query_length	Query_hit_start	Query_hit_end	Ref_hit_start	Ref_hit_end	Ref_length	I	Links
Query_1	ref_file_1	ref_hit_1	ref_hit_1_accession	2.1e-46	5520	Forward	154	64	90	85	105	180	I	pfam:link_1	enzyme_EC:link_2	description:free_text
Query_1	ref_file_1	ref_hit_2	ref_hit_2_accession	1e-43	3500	Reverse	154	95	130	3	41	70	I	ko:link_3	enzyme_EC:link_4	description:free_text
Query_1	ref_file_2	ref_hit_2	ref_hit_2_accession	1e-43	3500	Reverse	154	51	95	1	56	70	I	ko:link_5	enzyme_EC:link_6	description:free_text

See Intra-reference hit processing to understand how Mantis is capable of capturing multiple matches for the same protein sequence within the same reference dataset. However, some query sequences may have different hits against our HMM sources, it is therefore necessary to find some sort of consensus between the several hits:

`consensus_annotation.tsv`

The consensus_annotation.tsv will look something like this:

Query	Ref_Files	Ref_Hits	Consensus_hits	Total_hits	I	Links
Query_1	ref_file_1;ref_file_2	ref_hit_1;ref_hit_2	3	3	I	pfam:link_1	enzyme_EC:link_2	description:free_text
Query_2	ref_file_3;	ref_hit_3	4	5	I	ko:link_1	description:free_text

The consensus coverage is the amount of hit sources that reached a consensus out of all the hit sources for the current query sequence (if there are 4 hits in the consensus and we had a total of 5 hits, then the consensus coverage is 4/5). If the consensus coverage is quite low, consider also taking a look at the results in the integrated_annotation.tsv.
Note how each line now corresponds to a query sequence and how this query sequence can match against different reference sources, thus having different HMM profile/sequence matches.
The consensus is not a mere agglomerate of all the matches found across the different ref sources, it is actually a group of hits that form a consensus. Please see Inter-reference hit processing for more information.

Which information is included in the outputs?

The output ultimately depends on the reference datasets used, since each will include its own metadata (e.g., Kofam includes cross-linking with kegg_ko and others IDs like enzyme_ec). So which information is included in the default reference datasets?

kofam - cazy, cog, description, enzyme_ec, go, kegg_ko, pfam, tcdb
NCBI - description, enzyme_ec, go, pfam, tigrfam
eggNOG - bigg_reaction, cazy, description, enzyme_ec, go, kegg_brite, kegg_ko, kegg_module, kegg_pathway, kegg_rclass, kegg_reaction
pfam - cog, description, enzyme_ec, go, pfam
tcdb - description, go, pfam, tcdb

And for references included in this url:

bigg_genes - bigg_reaction, biocyc_reaction, enzyme_ec, kegg_reaction, metanetx, reactome, rhea, seed
bigg_reactions - biocyc_reaction, enzyme_ec, kegg_reaction, metanetx, reactome, rhea, seed
uniprot_ec - bigg, enzyme_ec
uniprot_rhea - bigg, biocyc_reaction, enzyme_ec, go, kegg_reaction, rhea
uniprot_reactome - bigg, reactome

While the included metadata is quite extensive, it might be it doesn't contain the necessary cross-linking you require. If that is the case you can use a web-scrapper found here to collect and cross-link your data.

Other outputs

GFF format

Mantis can output both integrated_annotation.tsv and consensus_annotation.tsv by adding -gff when running Mantis. Please keep in mind this is not the usual contig-centric GFF format, instead it is sequence-centric.

The GFF file will contain the sequence ID, the coordinates of the match, the e-value and respective attributes (as per the usual gff format). The attributes section is somewhat more specific to Mantis, here we include Name and Target which represent the hit with the reference file. The Note section includes which reference file the match represents as well as the length of the reference sequence/HMM (ref_len). This section will also include any functional descriptions associated with the reference hit. Lastly, we also include the identifiers of the reference hit in Dbxref and Ontology_term.

KEGG module completeness

Mantis can also generate a KEGG modules completeness matrix (add -km when running Mantis):

Module	Sample1	Sample2
M01	0.5	0.3
M02	0.2	0.01
...	...	...
M0N	0.6	0.8

In verbose mode (i.e. --verbose_kegg_matrix / -vkm) additional information is provided:

Module	Sample1	KOs Sample1	Missing KOs Sample1	Sample2	KOs Sample2	Missing KOs Sample2
M01 module_description	0.5	KO1,...KON	KO1,...KON	0.5	KO1,...KON	KO1,...KON
M02 module_description	0.2	KO1,...KON	KO1,...KON	0.2	KO1,...KON	KO1,...KON
...	...	...	...	...	...	...
M0N module_description	0.6	KO1,...KON	KO1,...KON	0.6	KO1,...KON	KO1,...KON

Uploading results to KEGG mapper reconstruct

When you using either -km or -vkm Mantis will also output the kos per sample, which you can then upload to KEGG mapper reconstruct for visualization purposes. The output file will be named sample_kos.tsv and will be formatted in the following manner:

Sample	KO
S1_seq1	KO1
S1_seq1	KO2
S1_seq2	KO3
...	...
SN	KON

A sample file can be found here.

Helper parser functions

I have included parser functions for the integrated_annotation.tsv and consensus_annotation.tsv that will return dictionaries. You can use these to then format your own outputs for any sort of downstream analysis. You can get the .py here

Intermediate files

Mantis also generates other output files and folders.

searchout: contains the domtblout files generated by HMMER and Diamond
output_hmmer: contains HMMER's console output
Mantis.out: contains Mantis's console output

Intermediate files (except Mantis.out) are removed by default, -k to keep them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output

Output

Mantis Output

`output_annotation.tsv`

`integrated_annotation.tsv`

`consensus_annotation.tsv`

Which information is included in the outputs?

Other outputs

GFF format

KEGG module completeness

Uploading results to KEGG mapper reconstruct

Helper parser functions

Intermediate files

Contents

Clone this wiki locally