Skip to content

Commit

Permalink
renamed normalize as frac
Browse files Browse the repository at this point in the history
  • Loading branch information
qiyunzhu committed Jun 27, 2021
1 parent 8a1d530 commit 6856043
Show file tree
Hide file tree
Showing 8 changed files with 13 additions and 14 deletions.
2 changes: 1 addition & 1 deletion doc/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ Option | Description
`--input`, `-i` (required) | Path to input profile.
`--map`, `-m` (required) | Path to mapping of source features to target features.
`--output`, `-o` (required) | Path to output profile.
`--normalize`, `-z` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one.
`--frac`, `-f` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one.
`--names`, `-n` | Path to mapping of target features to names. The names will be appended to the collapsed profile as a metadata column.


Expand Down
2 changes: 1 addition & 1 deletion doc/collapse.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ source4 <tab> target3

### Normalization

By default, if one source feature is simultaneously mapped to _k_ targets, each target will be counted once. With the `--normalize` or `-z` flag added to the command, each target will be counted 1 / _k_ times.
By default, if one source feature is simultaneously mapped to _k_ targets, each target will be counted once. With the `--frac` or `-f` flag added to the command, each target will be counted 1 / _k_ times.

Whether to enable normalization depends on the nature and aim of your analysis. For example, one gene is involved in two pathways (which isn't uncommon), should each pathway be counted once, or half time?

Expand Down
2 changes: 1 addition & 1 deletion doc/metacyc.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ woltka tools collapse -i pathway.biom -m metacyc/pathway-to-super_pathway.txt -n
woltka tools collapse -i super_pathway.biom -m metacyc/pathway_type.txt -n metacyc/all_class_name.txt -o pathway_type.biom
```

The collapse command supports **many-to-many** mapping. For example, if one reaction is found in three pathways, each pathway will be counted **once**. In some instances (e.g., to retain compositionality of the profile), one may consider adding the `--normalize` flag, which will instruct the program to count each pathway 1 / 3 times ([see details](collapse.md)).
The collapse command supports **many-to-many** mapping. For example, if one reaction is found in three pathways, each pathway will be counted **once**. In some instances (e.g., to retain compositionality of the profile), one may consider adding the `--frac` flag, which will instruct the program to count each pathway 1 / 3 times ([see details](collapse.md)).


## Pathway coverage
Expand Down
3 changes: 1 addition & 2 deletions doc/wol.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,8 +220,7 @@ So on so forth. See [here](metacyc.md) for a graph of all available collapsing d

`classify` only supports a tree structure, in which one child unit has exactly one parent unit. This is typical in taxonomic classification. If multiple parents are present, all but the first parent will be discarded. In contrast, `collapse` supports **one-to-multiple** mappings, therefore it is more suitable when this is the norm instead of exception, especially in functional classification (where one gene can be involved in multiple metabolic pathways).

`classify` always ensures the **compositionality** of the feature table, in which the frequencies match the numbers of aligned sequences. `collapse` however does not by default. In a one-to-multiple mapping, all parents will be counted once. But one can add `--normalize` to the `collapse` command to normalize the counts by the number of parents so that the compositionality is retained.

`classify` always ensures the **compositionality** of the feature table, in which the frequencies match the numbers of aligned sequences. `collapse` however does not by default. In a one-to-multiple mapping, all parents will be counted once. But one can add `--frac` to the `collapse` command to normalize the counts by the number of parents so that the compositionality is retained.

## Stratified taxonomic / functional classification

Expand Down
2 changes: 1 addition & 1 deletion woltka/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@ def merge_cmd(ctx, **kwargs):
type=click.Path(writable=True, dir_okay=False),
help='Path to output profile.')
@click.option(
'--normalize', '-z', is_flag=True,
'--frac', '-f', is_flag=True,
help=('Count each target feature as 1/k (k is the number of targets '
'mapped to a source). Otherwise, count as one.'))
@click.option(
Expand Down
2 changes: 1 addition & 1 deletion woltka/tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ def test_collapse_cmd(self):
'--map', map_fp,
'--output', output_fp,
'--names', names_fp,
'--normalize']
'--frac']
res = self.runner.invoke(collapse_cmd, params)
self.assertEqual(res.exit_code, 0)
self.assertEqual(res.output.splitlines()[-1],
Expand Down
2 changes: 1 addition & 1 deletion woltka/tests/test_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ def test_collapse_wf(self):
# wrong mapping file
map_fp = join(self.datdir, 'tree.nwk')
with self.assertRaises(SystemExit) as ctx:
collapse_wf(input_fp, map_fp, output_fp, normalize=True)
collapse_wf(input_fp, map_fp, output_fp, frac=True)
errmsg = 'No source-target relationship is found in tree.nwk.'
self.assertEqual(str(ctx.exception), errmsg)

Expand Down
12 changes: 6 additions & 6 deletions woltka/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,11 +192,11 @@ def _read_profile(fp):
click.echo('Merged profile written.')


def collapse_wf(input_fp: str,
map_fp: str,
output_fp: str,
normalize: bool = False,
names_fp: str = None):
def collapse_wf(input_fp: str,
map_fp: str,
output_fp: str,
frac: bool = False,
names_fp: str = None):
"""Workflow for collapsing a profile based on many-to-many mapping.
Raises
Expand Down Expand Up @@ -225,7 +225,7 @@ def collapse_wf(input_fp: str,

# collapse profile by mapping
click.echo('Collapsing profile...', nl=False)
table = collapse_table(table, mapping, normalize)
table = collapse_table(table, mapping, frac)
click.echo(' Done.')
n = table_shape(table)[0]
click.echo(f'Number of features after collapsing: {n}.')
Expand Down

0 comments on commit 6856043

Please sign in to comment.