-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added collapsing stratified table #126
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a handy feature update. I think it will be very useful.
@@ -126,10 +126,10 @@ Option | Description | |||
`--input`, `-i` (required) | Path to input profile. | |||
`--map`, `-m` (required) | Path to mapping of source features to target features. | |||
`--output`, `-o` (required) | Path to output profile. | |||
`--frac`, `-f` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one. | |||
`--divide`, `-d` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check --frac parameter. Previously was --normalize, and is now inconsistent within the codebase.
--frac actually does sound more intuitive than --divide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was modified during a previous effort in replacing Travis CI with GitHub Actions (I changed some trivial code in order to fire a PR...). Now it should be consistently --divide
.
@@ -126,10 +126,10 @@ Option | Description | |||
`--input`, `-i` (required) | Path to input profile. | |||
`--map`, `-m` (required) | Path to mapping of source features to target features. | |||
`--output`, `-o` (required) | Path to output profile. | |||
`--frac`, `-f` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one. | |||
`--divide`, `-d` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one. | |||
`--field`, `-f` | Index of field to be collapsed in a stratified profile. For example, use `-f 2` to collapse "gene" in "microbe\|gene". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a future feature update where the field can be a name like 'gene' instead of an index.
Index is fine for now, but this is something to consider for ease of use for end users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! One reason this hasn't been implemented is that QIIME 2 has mandatory index headers (like #FeatureID
). We need to think about where else we can store these field definitions.
@@ -223,9 +224,13 @@ def collapse_wf(input_fp: str, | |||
if not mapping: | |||
exit(f'No source-target relationship is found in {basename(map_fp)}.') | |||
|
|||
# convert field index | |||
if field: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is change a user-entered field number (starting from 1) into a Python list index (starting from 0).
This PR adds support for collapsing a stratified table.
For example, the input profile
phylum_ko.tsv
is like:One wants to collapse KOs into modules using a mapping file
ko-to-module.tsv
, while leaving phyla the same:One can do (2 means the second field in each feature ID):
The output will be like:
@droush