Skip to content

Commit

Permalink
fixed typos
Browse files Browse the repository at this point in the history
  • Loading branch information
qiyunzhu committed Feb 8, 2021
1 parent eaed749 commit 4cabeef
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 9 deletions.
10 changes: 5 additions & 5 deletions doc/collapse.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ woltka tools collapse -i input.biom -m mapping.txt -o output.biom
With this tool one can achieve the following goals:

1. Translate feature IDs into names or descriptions.
..* Example: Translate taxonomic IDs to taxon names.
..* Example: Translate [UniRef](https://www.uniprot.org/help/uniref) IDs to protein names, while **merging** same names.
- Example: Translate taxonomic IDs to taxon names.
- Example: Translate [UniRef](https://www.uniprot.org/help/uniref) IDs to protein names, while **merging** same names.

2. Group lower features into higher categories.
..* Example: Convert genera into families.
- Example: Convert genera to families.

3. Convert lower features into higher ones, where each lower feature may correspond to **multiple** higher features.
..* Example: Convert KEGG [orthologs](https://www.genome.jp/kegg/ko.html) to [pathways](https://www.genome.jp/kegg/pathway.html).
..* Example: Convert [GO](http://geneontology.org/docs/ontology-documentation/) terms to [GO Slim](http://www-legacy.geneontology.org/GO.slims.shtml) terms.
- Example: Convert KEGG [orthologs](https://www.genome.jp/kegg/ko.html) to [pathways](https://www.genome.jp/kegg/pathway.html).
- Example: Convert [GO](http://geneontology.org/docs/ontology-documentation/) terms to [GO Slim](http://www-legacy.geneontology.org/GO.slims.shtml) terms.

The last usage is an important complement to the main classification workflow, which currently relies on a tree structure and does not support one-to-many mapping. This can be achieved by using the profile collapsing function (although one can only move up one level per run).

Expand Down
2 changes: 1 addition & 1 deletion doc/kegg.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Working with KEGG

**KEGG** (https://www.genome.jp/kegg/)([Kanehisa et al., 2021](https://academic.oup.com/nar/article/49/D1/D545/5943834)) is the classical database of biological functions. It provides a well-organized hierarchical identification system, such as orthologies (K), modules (M), reactions (R), compounds (C), pathways, diseases and more.
**KEGG** (https://www.genome.jp/kegg/) ([Kanehisa et al., 2021](https://academic.oup.com/nar/article/49/D1/D545/5943834)) is the classical database of biological functions. It provides a well-organized hierarchical identification system, such as orthologies (K), modules (M), reactions (R), compounds (C), pathways, diseases and more.

Whereas the FTP access to KEGG is limited to subscribed users, the mapping of UniRef entries to KEGG orthology (KO) entries is freely available from the [UniProt](https://www.uniprot.org/downloads) data release. From this point on, we provide a Python script: [**kegg_query.py**](https://github.com/qiyunzhu/utils/blob/main/kegg_query.py) to automatically retrieve higher-level classification information of a given KO list or table from the KEGG server. This is made possible using the official [KEGG REST API](https://www.kegg.jp/kegg/rest/), which is freely available to academic users (however restrictions may apply; see official policy [here](https://www.kegg.jp/kegg/rest/)).

Expand Down
6 changes: 3 additions & 3 deletions doc/metacyc.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Working with MetaCyc

**MetaCyc** (https://metacyc.org/) ([Caspi et al 2020](https://academic.oup.com/nar/article/48/D1/D445/5581728)) is a metabolic pathway database that has been widely used in genomic, metagenomic and metabolomic studies. It provides a hierarchical classification system, including genes, proteins, reactions, compounds, pathways and more.
**MetaCyc** (https://metacyc.org/) ([Caspi et al., 2020](https://academic.oup.com/nar/article/48/D1/D445/5581728)) is a metabolic pathway database that has been widely used in genomic, metagenomic and metabolomic studies. It provides a hierarchical classification system, including genes, proteins, reactions, compounds, pathways and more.


## Contents
Expand Down Expand Up @@ -65,7 +65,7 @@ regulation < enzrxn
type
```

All transitions are enabled using Woltka's [collapse](collapse.md) command with individual mapping files. For example: one generate profiles along the following cascade:
All transitions are enabled using Woltka's [collapse](collapse.md) command with individual mapping files. For example, one can generate profiles along the following cascade:

```
protein - enzrxn - reaction - pathway - super pathway - type
Expand All @@ -86,7 +86,7 @@ woltka tools collapse -i pathway.biom -m metacyc/pathway-to-super_pathway.txt -n
woltka tools collapse -i super_pathway.biom -m metacyc/pathway_type.txt -n metacyc/all_class_name.txt -o pathway_type.biom
```

The collapse command supports **many-to-many** mapping. For example, if one reaction is found in three pathways, each pathway will be counted **once**. In some instances (e.g., to retain **compositionality** of the profile), one may consider adding the `--normalize` flag, which will instruct the program to count each pathway 1 / 3 times ([see details](collapse.md)).
The collapse command supports **many-to-many** mapping. For example, if one reaction is found in three pathways, each pathway will be counted **once**. In some instances (e.g., to retain compositionality of the profile), one may consider adding the `--normalize` flag, which will instruct the program to count each pathway 1 / 3 times ([see details](collapse.md)).


## Pathway coverage
Expand Down

0 comments on commit 4cabeef

Please sign in to comment.