-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] add abundance-weighted columns to gather output #2249
Conversation
Codecov Report
@@ Coverage Diff @@
## latest #2249 +/- ##
=======================================
Coverage 84.84% 84.85%
=======================================
Files 131 131
Lines 15653 15664 +11
Branches 2245 2249 +4
=======================================
+ Hits 13281 13291 +10
Misses 2082 2082
- Partials 290 291 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@bluegenes ok, updated. On inspection I think that perhaps it is/was possible to calculate some or all of this stuff before from the CSV output - test If you get a chance to look at this, would appreciate your interim stamp of approval :). I might not get to work on this more until later today. |
STAMP --the new columns are great and I think they'll be really helpful! ye, re already able to calculate most things - the main issue was needing total_weighted_missed for the unclassified portion, which I wasn't sure how to calculate without outputting the total weighted query hashes from gather. |
@bluegenes I think I'm missing one thing before I can properly finish this PR off - I need to regenerate |
I believe it was this one, just renamed: https://github.com/taylorreiter/2021-sourmash-taxonomy-hackathon/tree/main/outputs/sigs ..realizing now I probably should have added it, sry!
|
I think this is ready for review @bluegenes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. thanks for adding!
Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
This PR adds three abundance-weighted columns to gather output:
n_unique_weighted_found
- the summed abundances of the found hashes at each stepsum_weighted_found
- the running total ofn_unique_weighted_found
, cumulative at this steptotal_weighted_hashes
- the sum total of all hash abundancesIt also updates the
kreport
format ofsourmash tax metagenome
to output weighted bp estimates.Fixes #2240.
TODO
kreport
format from [MRG] add kreport output format to tax metagenome #2239 to support weighted reporting--ignore-abundance
total_abund
inGatherResult
GatherResult
andFracMinHashComparison
classes for other cleanup