-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Update MinHash.set_abundances
to remove hash if 0 abund; handle negative abundances.
#1575
Conversation
src/sourmash/minhash.py
Outdated
@@ -625,7 +625,7 @@ def set_abundances(self, values, clear=True): | |||
|
|||
for h, v in values.items(): | |||
hashes.append(h) | |||
abunds.append(v) | |||
abunds.append(max(0, v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't cast negative values (signed int) to unsigned int to be passed to the rust:add_hash_with_abundance. Instead, convert the negative abundance values in Python to zero before passing to rust.
src/sourmash/minhash.py
Outdated
@@ -625,7 +625,7 @@ def set_abundances(self, values, clear=True): | |||
|
|||
for h, v in values.items(): | |||
hashes.append(h) | |||
abunds.append(v) | |||
abunds.append(max(0, v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hot take: I think this behavior is strange! We don't support negative abundances, and probably shouldn't support passing in negative abundances.
How about:
- 0 deletes hash
- < 0 raises
ValueError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, more reasonable.
Ok, I have a question. Is it useful/supported to change specific abundances without dropping the rest? set_abundance
can add to the current values but can't change/reduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, I don't know 😄 . We aren't really using abundances a lot at the moment so I don't have many good use cases in mind. To me, that suggests that any choice we make is going to be wrong... It's easy enough to add new methods or adjust this one down the road, so let's just make sure whatever you choose is well tested! Unless @luizirber has thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm... I was thinking of adding a bool overwrite
flag so the user can overwrite some abundance values without clearing the rest, but, honestly, I have no idea about its importance. Let's just wait for a use case 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on waiting for use case
tests/test_minhash.py
Outdated
mh.set_abundances({ 1: 5, 2: 3, 3 : 5 }) | ||
mh.set_abundances({ 1: 0, 2 : -1 }, clear=False) | ||
assert 1 not in dict(mh.hashes) | ||
assert 2 not in dict(mh.hashes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably also check that the abundance of hash 3 is five, and length of mh.hashes is 1?
Codecov Report
@@ Coverage Diff @@
## latest #1575 +/- ##
==========================================
+ Coverage 80.96% 81.07% +0.10%
==========================================
Files 102 102
Lines 10299 10303 +4
Branches 1165 1165
==========================================
+ Hits 8339 8353 +14
+ Misses 1751 1742 -9
+ Partials 209 208 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
set_abundance <=0
set_abundance <=0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! One last change request: could you update the docstring for set_abundances in Python, too?
Also, please correct the title and expand the description at the top of the PR by just a little bit to repeat the information that is currently in the title. Thanks! |
set_abundance <=0
abund == 0
& handle -ve abund
nitpick: I'm going to change the title to mention |
abund == 0
& handle -ve abund
MinHash.set_abundances
to remove hash if 0 abund; handle negative abundances.
Update
MinHash.set_abundances(...)
to catch negative values and support hash removal by setting abundance to 0.hash
from thesketch
if the abundance is set to zero.ValueError
exception if the abundance is a negative value.