-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accept metabolite (GNPS) feature metadata, and validate this in tests #49
Comments
There's a lot to describe here. Let me try to go through it. rankratioviz/generate.py -Added more documentation to matchdf(). -Added various assertions that should prevent samples and features from being dropped (#54). I think that our use of matchdf(), in addition to these assertions, should mean that #54 is done -- but I'd like to add some tests that verify that. -To support these assertions, sample metadata is now a parameter of process_input(). -Completely rewrote the feature metadata label creation code. It should generalize now to any sort of feature metadata, at the cost of not being as pretty as the old taxonomy label code. I might add that back in the future, and/or add some new label creation code to handle metabolite data more carefully (e.g. assuming that the feature description includes the feature ID) (#49). -The reason I rewrote this was in order to prevent features from being dropped just because they lacked feature metadata (this was the case until now) (#54). This is testable on the Red Sea dataset songbird uses, for which 172 (iirc) features have no associated metadata (those features are just assigned their ID by itself now, instead of being completely dropped). -Didn't call .T on the table twice at the start of process_input (I don't know why this was originally done, but it doesn't do anything. and our not doing it saves a tiny bit of time). -I think the changes to the feature metadata label creation code also may have made the table new column assignments more accurate -- it seems like the old code assumed that the table columns had the same order as the feature meta "Taxon_" values -- but I'm not 100% sure of that. In any case, it's correct now. (We'll eventually double-check the correctness of all the count data in the sample JSON data when #2 rolls around, so this will be doubly ensured to be correct.) -Added a small sanity test to process_input() to ensure that adding feature metadata didn't make some feature IDs the same (should never happen but you never know). -To fix a bug with the red sea data, made the default rank column explicitly a numeric dtype (see #62 for a retrospective and details). More work on this sorta front remains to be done. -Some small aesthetic tweaks to the interface: "Ranks" -> "Feature Ranks" for the rank plot title, and adding back "x" to the feature tooltips. rankratioviz/q2/_method.py -Added logic to accept either a DataFrame or OrdinationResults as input, which promptly didn't work (I got the error discussed here: https://forum.qiime2.org/t/plugin-error-issubclass-arg-1-must-be-a-class/2774) Changing the ranks type from a Union[DF, OrdResults] to just one of the two works: so the problem is with that. I'll try to get it fixed soon. -Added logic to process these files and generate a feature_ranks dataframe. nothing too fancy. -adjusted code a tiny bit to throw in df_sample_metadata to process_input(), as discussed above. rankratioviz/q2/plugin_setup.py -Added a substantial amount of code to check if songbird is available and, if so, accepts the FeatureData[Differential] input. I'd like for this to not be needed, but apparently it is (see https://github.com/mortonjt/songbird/pull/34). If we have to do this, I can add songbird as an "extras" dependency in setup.py. So that's another thing to do for #51. -Also added some logic to update the --i-ranks description depending on whether or not songbird input is acceptable. -This logic isn't super testable, since my conda/Q2 installation has a habit of keeping references to uninstalled plugins (even when I refresh the cache) -- so testing this is kind of frustrating. I'll think about it as I move past getting the basic functionality of #51 out of the way. rankratioviz/scripts/_plot.py -Some changes to work with the _rank_processing module. -Renamed "taxonomy" to "feature_metadata" (makes the code a lot clearer). -Adjustments to pass in sample metadata to generate.process_input(). -I ended up just treating the first column of the feature metadata file as the index by default, instead of searching for the 'feature id' column and using that -- this should be functionally equivalent for the old test data, and far more robust. also it follows QIIME 2's metadata guidelines. OK! I uh think that is everything for now. sorry this message ended up being so long. Probably should've done more atomic commits, but everything sort of ended up relying on everything else and it got to the point where any commit of individual files would've just broken the tests. GRANTED: I think I already did that tonight, but whatever, you get the point.
Currently, rankratioviz "annotates" feature IDs by adding on all feature metadata fields to those IDs where feature metadata has been given, separated by Some feature metadata annotation is sorta being done in the main matching integration test. See about moving that into the main (This is also a good time to beef up the sample plot JSON validation.) Once that's done, we will be able to close this. |
Need to validate that feature metadata annotation works properly, then #49 will be taken care of.
Progress towards #49. Might make this standalone-rrv-exclusive for now?
GNPS support is tentatively ready. Once more tests are in that verify that this works, I'm content to close this issue. (May be worth adding another issue for Q2 GNPS feature metadata support, but that's another story.) |
obviously gnps f. metadata support is still mostly untested/experimental, but this is a decent start
You can still use GNPS data with Qurro! Now you'll just have to supply "normal" data. This removes all of that experimental code, which is nice.
At least with the Red Sea metabolite data used in the QIIME 2 tutorial on Songbird, this seems to just be a TSV file where each line contains a metabolite (feature) ID, a "cluster," and a description. I don't know if the "cluster" is important (maybe?) but, in any case, making this work should be as simple as adjusting the code inUpdate: yeah, these features are KEGG orthologs, not metabolites. My bad for not reading that paper in detail. But it's still cool that this supports those types of features!rankratioviz.generate.process_input()
that readstaxam
(feature metadata) to create labels for each taxon/metabolite (and using the correct column names for the metabolites). It might be easiest to just have the user pass in a--metabolite-data
flag or something, so we know how to read the feature metadata file properly [1].So what we actually want to do for legit metabolomics feature metadata, from what I can tell, involves handling the feature ID used in the BIOM table/ranks input as a combo of the mass-to-charge-ratio and discharge time, and extracting the Library ID from a feature metadata file in which both parts of the feature ID are their own columns.
To validate this, we should at least
The text was updated successfully, but these errors were encountered: