More sophisticated feature metadata annotation #132

fedarko · 2019-05-20T19:38:02Z

This would preserve the original feature IDs. Instead of annotating them by adding on a | next to each feature metadata field to make a really long ID, this would store feature metadata somewhere accessible by RRVDisplay and Vega (probs as a property of each feature in the rank plot data). Then this data would show up naturally split up in the rank plot tooltips, which would look nice.
e.g.

Feature ID: TAACTACATAGATACAG...
Classification: Numerator
Current Ranking: 5
Taxonomy: k__Bacteria; ...
Confidence: 0.85

Also, the more important benefit of this: we could also store all the feature metadata column names (plus Feature ID?) in another easily-accessible place. Then, in the JS, we could use RRVDisplay.populateSelectDOM() to add feature metadata column names to a list of search options: this would essentially let us just perform exact matching, but limited to whatever feature metadata field we care about (so no worries about having a feature ID that coincidentally has the word "Bacteria" in it, or whatever). This would address #125.

The primary downsides are 1) extra storage costs due to having to store feature metadata field names for each feature, and 2) I think this might remove the support for searching by different taxa at once in the same query that's currently available. We could get around 2) by improving the search functionality to make that more explicit (or heck, make "OR" queries doable for every feature metadata field), and 1) isn't that big of a deal (we could use column mapping if it's a huge enough problem).

The text was updated successfully, but these errors were encountered:

This constitutes most of the python progress on #132 (and by extension also #125). Next up for #132: -Add tooltip defs in the rank plot for all feature metadata cols. I guess we could do this in the python side of things, in the rank_chart definition. -create a list of non-ranking feature data columns (all feature data fields, minus ranking columns, minus Feature ID and Classification) to populate two <select>s with all avail. feature metadata columns. -Make the JS searching functionality look by selected feature metadata field (one select for num search, one select for den search). -Ignore features that have "null" for a given col (i.e. no row in the feature md file). This might cause some confusion if any metadata actually has "null" as a given string, but I *think* this should be ok. (Should add test cases; tagging #2 and #62.) -Support multiple searches (e.g. with multiple taxa). -MAYBE support searching by taxonomic rank as current? -IDK. -Also maybe more specialized searches by field type (e.g. if the field is numeric, limit to ranges). -Probably goes beyond the scope of #132 tho.

fedarko · 2019-05-22T01:39:10Z

One TODO: use column name and/or type to do cool stuff in searching. e.g. if it's "Taxonomy" or "Taxon" then split by semicolons?

Also stores feature metadata cols in the rank plot JSON -- will be super easy to retrieve and use these in the viz interface #132

All that's needed to reestablish most of the prior search functionality (and close out #125) will be making filterFeatures() accept the *entire* dataset (or perhaps just a subset based on the search type) and look through that.

Notes: 1) Ignores numeric feature metadata values. In the future, we should support detecting these types? or at least convert to string before doing text searching. 2) No "rank search" equivalent yet, but users can still include semicolons in the search. Might reimplement this if featureMetadataField is "Taxon" or "Taxonomy" or something, but not urgent IMO. 3) would be nice to add multiple queries for searching -- e.g. Confidence > 0.95 AND Taxon contains text "p__Firmicutes". 4) I'd like to include at least the taxonomy in the feature lists in the textareas -- should be doable to "redo" the annotation process in JS

Also made the search functionality now output the entire data object (i.e. the entire "row" for a given feature) instead of just the feature ID. This will let us eventually customize how selected features appear in the textareas.

Shouldn't be *too* difficult to modify filterFeatures() to detect the search type and then apply that, same as before. As discussed in #132 and the various commit messages that reference it, I'd like to eventually support "joint" queries where you can filter on multiple criterion (e.g. "contains this text ... and contains these taxonomic ranks ... and has a confidence greater than ..."), but that sounds like a ton of work. For now, just getting back to the previous functionality in a bug-free state (i.e. with issue #125 knocked out) will be good enough. ...So future issues to make after #132 and #125 are: -support searching by ranges on numerical feature metadata fields -support joint queries across multiple feature metadata fields

fedarko added the enhancement New feature or request label May 20, 2019

fedarko self-assigned this May 20, 2019

This was referenced May 21, 2019

Make note of places where certain column/etc. names can't be used #55

Closed

double-check data/dataframe types in code #62

Open

fedarko added a commit that referenced this issue May 22, 2019

ENH: Use feat. metadata fields as r.plot tooltips

3975430

Also stores feature metadata cols in the rank plot JSON -- will be super easy to retrieve and use these in the viz interface #132

fedarko added the important Things that are critical for getting Qurro in a working/useful state label May 23, 2019

This was referenced May 26, 2019

"Joint" searching queries across multiple feature ranking/metadata fields #140

Open

Separated feature metadata #142

Merged

fedarko closed this as completed in #142 May 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More sophisticated feature metadata annotation #132

More sophisticated feature metadata annotation #132

fedarko commented May 20, 2019

fedarko commented May 22, 2019

More sophisticated feature metadata annotation #132

More sophisticated feature metadata annotation #132

Comments

fedarko commented May 20, 2019

fedarko commented May 22, 2019