layout	title
page	Sorting the unknown

We defined initially four categories of unknowns (might be more in the future) trying to combine an ecological and a protein domain based approach to their definition. The categories are defined as follow:

KNOWN: Our knowns are all those ORFs that contains a Pfam domain. We are developing an approach to assign function to the unknown ORFs that relies on Domain Co-cocurrence Networks and uses Pfam as a basic building block.
GENOMIC UNKNOWNS: The first categories of unknowns are those ORFs with unknown function but associated to a sequenced organism, or to population genomes (aka Metagenome Assembled Genomes).
ENVIRONMENTAL UNKNOWNS: The second category of unkwnowns are those ORFs with unknown function, which cannot be associated to an organism and are found only in environmental metagenomes

A bioinformatic workflow to structure the unknown functional space

We have implemented a bioinformatic workflow that performs the partitioning of genomic metagenomic datasets on the different categories of KNOWNS and UNKNOWNS.

We start from a de-novo clustering of all genomic and environmental genes and continue through a complex pipeline that validates and characterizes the gene clusters. For a more detailed explanation of the pipeline check how we create the protein clusters, how we do the validation and how we classify them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

categories.md

categories.md

A bioinformatic workflow to structure the unknown functional space

Files

categories.md

Latest commit

History

categories.md

File metadata and controls

A bioinformatic workflow to structure the unknown functional space