Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More sophisticated feature metadata annotation #132

Closed
fedarko opened this issue May 20, 2019 · 1 comment · Fixed by #142
Closed

More sophisticated feature metadata annotation #132

fedarko opened this issue May 20, 2019 · 1 comment · Fixed by #142
Assignees
Labels
enhancement New feature or request important Things that are critical for getting Qurro in a working/useful state

Comments

@fedarko
Copy link
Collaborator

fedarko commented May 20, 2019

This would preserve the original feature IDs. Instead of annotating them by adding on a | next to each feature metadata field to make a really long ID, this would store feature metadata somewhere accessible by RRVDisplay and Vega (probs as a property of each feature in the rank plot data). Then this data would show up naturally split up in the rank plot tooltips, which would look nice.
e.g.

Feature ID: TAACTACATAGATACAG...
Classification: Numerator
Current Ranking: 5
Taxonomy: k__Bacteria; ...
Confidence: 0.85

Also, the more important benefit of this: we could also store all the feature metadata column names (plus Feature ID?) in another easily-accessible place. Then, in the JS, we could use RRVDisplay.populateSelectDOM() to add feature metadata column names to a list of search options: this would essentially let us just perform exact matching, but limited to whatever feature metadata field we care about (so no worries about having a feature ID that coincidentally has the word "Bacteria" in it, or whatever). This would address #125.

The primary downsides are 1) extra storage costs due to having to store feature metadata field names for each feature, and 2) I think this might remove the support for searching by different taxa at once in the same query that's currently available. We could get around 2) by improving the search functionality to make that more explicit (or heck, make "OR" queries doable for every feature metadata field), and 1) isn't that big of a deal (we could use column mapping if it's a huge enough problem).

@fedarko fedarko added the enhancement New feature or request label May 20, 2019
@fedarko fedarko self-assigned this May 20, 2019
fedarko added a commit that referenced this issue May 22, 2019
This constitutes most of the python progress on #132 (and by extension
also #125).

Next up for #132:

-Add tooltip defs in the rank plot for all feature metadata cols.
 I guess we could do this in the python side of things, in the
 rank_chart definition.

-create a list of non-ranking feature data columns (all feature data
 fields, minus ranking columns, minus Feature ID and Classification)
 to populate two <select>s with all avail. feature metadata columns.

-Make the JS searching functionality look by selected feature metadata
 field (one select for num search, one select for den search).
        -Ignore features that have "null" for a given col (i.e. no row
         in the feature md file). This might cause some confusion if any
         metadata actually has "null" as a given string, but I *think*
         this should be ok. (Should add test cases; tagging #2 and #62.)
        -Support multiple searches (e.g. with multiple taxa).
                -MAYBE support searching by taxonomic rank as current?
                -IDK.
                -Also maybe more specialized searches by field type
                 (e.g. if the field is numeric, limit to ranges).
                -Probably goes beyond the scope of #132 tho.
@fedarko
Copy link
Collaborator Author

fedarko commented May 22, 2019

One TODO: use column name and/or type to do cool stuff in searching. e.g. if it's "Taxonomy" or "Taxon" then split by semicolons?

fedarko added a commit that referenced this issue May 22, 2019
Also stores feature metadata cols in the rank plot JSON -- will be super
easy to retrieve and use these in the viz interface #132
fedarko added a commit that referenced this issue May 23, 2019
All that's needed to reestablish most of the prior search
functionality (and close out #125) will be making filterFeatures()
accept the *entire* dataset (or perhaps just a subset based on
the search type) and look through that.
fedarko added a commit that referenced this issue May 23, 2019
Notes:

1) Ignores numeric feature metadata values. In the future, we should
support detecting these types? or at least convert to string before
doing text searching.
2) No "rank search" equivalent yet, but users can still include
semicolons in the search. Might reimplement this if
featureMetadataField is "Taxon" or "Taxonomy" or something, but
not urgent IMO.
3) would be nice to add multiple queries for searching -- e.g.
Confidence > 0.95 AND Taxon contains text "p__Firmicutes".
4) I'd like to include at least the taxonomy in the feature lists in
the textareas -- should be doable to "redo" the annotation process in
JS
fedarko added a commit that referenced this issue May 23, 2019
Also made the search functionality now output the entire data object
(i.e. the entire "row" for a given feature) instead of just the
feature ID. This will let us eventually customize how selected
features appear in the textareas.
@fedarko fedarko added the important Things that are critical for getting Qurro in a working/useful state label May 23, 2019
fedarko added a commit that referenced this issue May 24, 2019
Shouldn't be *too* difficult to modify filterFeatures() to detect
the search type and then apply that, same as before.

As discussed in #132 and the various commit messages that reference
it, I'd like to eventually support "joint" queries where you can
filter on multiple criterion (e.g. "contains this text ... and
contains these taxonomic ranks ... and has a confidence greater than
..."), but that sounds like a ton of work. For now, just getting
back to the previous functionality in a bug-free state (i.e. with
issue #125 knocked out) will be good enough.

...So future issues to make after #132 and #125 are:
    -support searching by ranges on numerical feature metadata fields
    -support joint queries across multiple feature metadata fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request important Things that are critical for getting Qurro in a working/useful state
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant