barplot: make taxonomy optional #153

nbokulich · 2023-04-18T11:13:23Z

This PR makes FeatureData[Taxonomy] an optional input to barplot. In this case, feature labels are parsed from the feature IDs (either single-level, e.g., ASV IDs, or semicolon-delimited for multi-level)

In practice, this means that various other use cases are possible, e.g.,:

input a collapsed table. Taxonomy is parsed from feature IDs.
input an ASV/OTU table to look at ASV frequencies in a barplot.

I tested both of these use cases (and added appropriate unit tests). To fully validate, I ran the moving pictures tutorial and made barplots on the collapsed table. The output visualizations are identical.

ebolyen · 2023-04-21T19:29:32Z

cc @gregcaporaso and @colinvwood who've discussed this recently in the context of Kraken

nbokulich · 2023-05-05T09:04:35Z

hey @ebolyen @gregcaporaso @colinvwood

I have added a parameter parse_ids, which is False by default. @ebolyen I think this is what you requested when we discussed yesterday?

Is this acceptable with this change? This change does not ever require hierarchical labels — it just attempts to parse those labels if the user sets parse_ids=True.

ebolyen

Hey @nbokulich,

We were discussing this and @lizgehret came up with an idea that we all like on our end.

What if instead of parse_ids, there was something like level_delimiter=None (which would be equivalent to parse_ids=False).

The intention would be that a user could provide this delimiter if they knew the IDs had hierarchical ranks. In most cases they would pass ;, but the reason we like this is it no longer imbues the ID with an implicit schema (the taxonomy format). Instead it's up to the user to decide to interpret their IDs as ranks and they need to provide the schema to do so.

I've sketched the idea below. Let us know what you think!

ebolyen · 2023-05-10T16:41:34Z

q2_taxa/_visualizer.py

-def barplot(output_dir: str, table: biom.Table, taxonomy: pd.Series,
-            metadata: Metadata = None) -> None:
+def barplot(output_dir: str, table: biom.Table, taxonomy: pd.Series = None,
+            metadata: Metadata = None, parse_ids: bool = False) -> None:


Suggested change

metadata: Metadata = None, parse_ids: bool = False) -> None:

metadata: Metadata = None, level_delimiter: str = None) -> None:

ebolyen · 2023-05-10T16:48:54Z

q2_taxa/_visualizer.py

+        _ids = table.ids('observation')
+        taxonomy = pd.Series(_ids, index=_ids)
+        if not parse_ids:
+            collapse = False


Suggested change

_ids = table.ids('observation')

taxonomy = pd.Series(_ids, index=_ids)

if not parse_ids:

collapse = False

if level_delimiter is None:

collapse = False

else:

_ids = table.ids('observation')

ranks = [';'.join(r.split(level_delimiter)) for r in _ids]

taxonomy = pd.Series(ranks, index=_ids)

hey @ebolyen sounds good, thanks for the suggestion @lizgehret !

The delimiter could be passed to the _collapse_table function... then instead of just splitting and rejoining the ids, this also could be used to parse taxonomies with other delimiter (after all, I don't think that the semicolon delimiter is a requirement of the type, right?)

Just seeing this comment now - sorry. I commented on this below.

nbokulich · 2023-05-11T09:42:17Z

hey @ebolyen please see what you think of my latest commit. This adds the level_delimiter parameter, but also allows users to use this param to parse taxonomies that have non-semicolon level delimiters. This seems like a not-so-rare edge case, as some taxonomies we have seen are, e.g., pipe-delimited or comma-delimited. See my note about how the default (None) is handled.

gregcaporaso · 2023-05-16T18:01:56Z

@nbokulich, I don't think the delimiter option should be applied to FeatureData[Taxonomy] artifacts. This is effectively expanding the definition of the underlying format, which has always been assumed to be semi-colon delimited taxonomic levels (though I don't know if that is explicitly defined anywhere), to allow arbitrary delimiters. If we want to support this, creating a new format would be a better option, in which case the delimiter could even be defined as part of the format so the user doesn't need to know what it is. In any case though, this feels outside the scope of this PR and like something we should discuss if/how to support.

gregcaporaso · 2023-05-16T18:02:59Z

q2_taxa/_visualizer.py

    num_metadata_cols = metadata.column_count
    metadata = metadata.to_dataframe()
    jsonp_files, csv_files = [], []
-    collapsed_tables = _extract_to_level(taxonomy, table)
+    if collapse:
+        print(level_delimiter)


Remove print statement.

gregcaporaso · 2023-05-16T18:03:20Z

q2_taxa/plugin_setup.py

+                           'and a level_delimiter is passed, it will attempt '
+                           'to parse hierarchical taxonomic information from '
+                           'the feature ID labels. If no level_delimiter is '
+                           'provided and a taxonomy is passed, simicolon (;) '


simicolon -> semicolon

This reverts commit dcc9978. revert level_delimiter commit dcc9978

nbokulich · 2023-05-16T18:50:32Z

@gregcaporaso @ebolyen thanks for the feedback. I have reverted that last commit and replaced the level_delimiter param more or less as @ebolyen suggested above. I opted for the simpler str.replace(level_delimiter, ';') instead of ';'.join(str.split(level_delimiter)) as suggested by @ebolyen . Sound okay?

gregcaporaso

This all looks good to me, thanks for enduring our multiple iterations of comments on this one @nbokulich!

Your deviation from @ebolyen's suggestion seems good to me, but I'll let him comment in case there was an edge case he was thinking about when making that suggestion.

I will note that this won't behave correctly in the pathological case of taxonomy with ; as part of the taxonomy labels (not as delimiters), but that's something that wouldn't have been handled well before anyway so I don't think we should worry about that.

I have tested with several data sets locally:

No taxonomy provided, | as delimiters in feature ids, level_delimiter='|'
No taxonomy provided, | as delimiters in feature ids
Taxonomy provided, ; as level delimiters in taxonomy (our typical usage)
Taxonomy provided, ; as level delimiters in taxonomy, level_delimiter='|' (should give the same output as 3)

This all worked as expected.

gregcaporaso · 2023-05-17T15:57:19Z

Just chatted with @ebolyen and he didn't have any particular motivation for split/join versus replace, so we're good to go. Thanks again @nbokulich!

barplot: make taxonomy optional

275b95a

nbokulich mentioned this pull request Apr 18, 2023

add support for FeatureTable and FeatureData generation from classify-kraken results bokulich-lab/q2-annotate#29

Closed

ebolyen assigned gregcaporaso and colinvwood Apr 21, 2023

disable feature ID parsing by default

e1c9ced

lizgehret requested a review from ebolyen May 8, 2023 17:15

lizgehret assigned ebolyen May 8, 2023

ebolyen reviewed May 10, 2023

View reviewed changes

add level_delimiter param to barplot

dcc9978

gregcaporaso reviewed May 16, 2023

View reviewed changes

gregcaporaso mentioned this pull request May 16, 2023

da-barplot makes assumptions about feature id schema qiime2/q2-composition#121

Closed

nbokulich added 2 commits May 16, 2023 20:32

Revert "add level_delimiter param to barplot"

bbb60b4

This reverts commit dcc9978. revert level_delimiter commit dcc9978

FIX new level delimiter param

03a6933

gregcaporaso approved these changes May 17, 2023

View reviewed changes

This comment was marked as duplicate.

Sign in to view

gregcaporaso merged commit f2264ec into qiime2:master May 17, 2023

ebolyen mentioned this pull request Jul 11, 2024

identifiers are being parsed as feature labels in various visualizations qiime2/q2-types#337

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

barplot: make taxonomy optional #153

barplot: make taxonomy optional #153

nbokulich commented Apr 18, 2023

ebolyen commented Apr 21, 2023

nbokulich commented May 5, 2023

ebolyen left a comment

ebolyen May 10, 2023

ebolyen May 10, 2023

nbokulich May 10, 2023

gregcaporaso May 16, 2023

nbokulich commented May 11, 2023

gregcaporaso commented May 16, 2023

gregcaporaso May 16, 2023

gregcaporaso May 16, 2023

nbokulich commented May 16, 2023

gregcaporaso left a comment

This comment was marked as duplicate.

gregcaporaso commented May 17, 2023

	metadata: Metadata = None, parse_ids: bool = False) -> None:
	metadata: Metadata = None, level_delimiter: str = None) -> None:

barplot: make taxonomy optional #153

barplot: make taxonomy optional #153

Conversation

nbokulich commented Apr 18, 2023

ebolyen commented Apr 21, 2023

nbokulich commented May 5, 2023

ebolyen left a comment

Choose a reason for hiding this comment

ebolyen May 10, 2023

Choose a reason for hiding this comment

ebolyen May 10, 2023

Choose a reason for hiding this comment

nbokulich May 10, 2023

Choose a reason for hiding this comment

gregcaporaso May 16, 2023

Choose a reason for hiding this comment

nbokulich commented May 11, 2023

gregcaporaso commented May 16, 2023

gregcaporaso May 16, 2023

Choose a reason for hiding this comment

gregcaporaso May 16, 2023

Choose a reason for hiding this comment

nbokulich commented May 16, 2023

gregcaporaso left a comment

Choose a reason for hiding this comment

This comment was marked as duplicate.

gregcaporaso commented May 17, 2023