-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
barplot: make taxonomy optional #153
Conversation
cc @gregcaporaso and @colinvwood who've discussed this recently in the context of Kraken |
hey @ebolyen @gregcaporaso @colinvwood I have added a parameter Is this acceptable with this change? This change does not ever require hierarchical labels — it just attempts to parse those labels if the user sets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @nbokulich,
We were discussing this and @lizgehret came up with an idea that we all like on our end.
What if instead of parse_ids
, there was something like level_delimiter=None
(which would be equivalent to parse_ids=False
).
The intention would be that a user could provide this delimiter if they knew the IDs had hierarchical ranks. In most cases they would pass ;
, but the reason we like this is it no longer imbues the ID with an implicit schema (the taxonomy format). Instead it's up to the user to decide to interpret their IDs as ranks and they need to provide the schema to do so.
I've sketched the idea below. Let us know what you think!
q2_taxa/_visualizer.py
Outdated
def barplot(output_dir: str, table: biom.Table, taxonomy: pd.Series, | ||
metadata: Metadata = None) -> None: | ||
def barplot(output_dir: str, table: biom.Table, taxonomy: pd.Series = None, | ||
metadata: Metadata = None, parse_ids: bool = False) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata: Metadata = None, parse_ids: bool = False) -> None: | |
metadata: Metadata = None, level_delimiter: str = None) -> None: |
q2_taxa/_visualizer.py
Outdated
_ids = table.ids('observation') | ||
taxonomy = pd.Series(_ids, index=_ids) | ||
if not parse_ids: | ||
collapse = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ids = table.ids('observation') | |
taxonomy = pd.Series(_ids, index=_ids) | |
if not parse_ids: | |
collapse = False | |
if level_delimiter is None: | |
collapse = False | |
else: | |
_ids = table.ids('observation') | |
ranks = [';'.join(r.split(level_delimiter)) for r in _ids] | |
taxonomy = pd.Series(ranks, index=_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @ebolyen sounds good, thanks for the suggestion @lizgehret !
The delimiter could be passed to the _collapse_table
function... then instead of just splitting and rejoining the ids, this also could be used to parse taxonomies with other delimiter (after all, I don't think that the semicolon delimiter is a requirement of the type, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just seeing this comment now - sorry. I commented on this below.
hey @ebolyen please see what you think of my latest commit. This adds the |
@nbokulich, I don't think the delimiter option should be applied to |
q2_taxa/_visualizer.py
Outdated
num_metadata_cols = metadata.column_count | ||
metadata = metadata.to_dataframe() | ||
jsonp_files, csv_files = [], [] | ||
collapsed_tables = _extract_to_level(taxonomy, table) | ||
if collapse: | ||
print(level_delimiter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove print statement.
q2_taxa/plugin_setup.py
Outdated
'and a level_delimiter is passed, it will attempt ' | ||
'to parse hierarchical taxonomic information from ' | ||
'the feature ID labels. If no level_delimiter is ' | ||
'provided and a taxonomy is passed, simicolon (;) ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simicolon
-> semicolon
@gregcaporaso @ebolyen thanks for the feedback. I have reverted that last commit and replaced the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me, thanks for enduring our multiple iterations of comments on this one @nbokulich!
Your deviation from @ebolyen's suggestion seems good to me, but I'll let him comment in case there was an edge case he was thinking about when making that suggestion.
I will note that this won't behave correctly in the pathological case of taxonomy with ;
as part of the taxonomy labels (not as delimiters), but that's something that wouldn't have been handled well before anyway so I don't think we should worry about that.
I have tested with several data sets locally:
- No taxonomy provided,
|
as delimiters in feature ids,level_delimiter='|'
- No taxonomy provided,
|
as delimiters in feature ids - Taxonomy provided,
;
as level delimiters in taxonomy (our typical usage) - Taxonomy provided,
;
as level delimiters in taxonomy,level_delimiter='|'
(should give the same output as 3)
This all worked as expected.
Just chatted with @ebolyen and he didn't have any particular motivation for split/join versus replace, so we're good to go. Thanks again @nbokulich! |
fixes #116 and #108
This PR makes
FeatureData[Taxonomy]
an optional input tobarplot
. In this case, feature labels are parsed from the feature IDs (either single-level, e.g., ASV IDs, or semicolon-delimited for multi-level)In practice, this means that various other use cases are possible, e.g.,:
I tested both of these use cases (and added appropriate unit tests). To fully validate, I ran the moving pictures tutorial and made barplots on the collapsed table. The output visualizations are identical.