Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Update MinHash.set_abundances to remove hash if 0 abund; handle negative abundances. #1575

Merged
merged 7 commits into from
Jun 7, 2021
Merged

Conversation

mr-eyes
Copy link
Member

@mr-eyes mr-eyes commented Jun 7, 2021

Update MinHash.set_abundances(...) to catch negative values and support hash removal by setting abundance to 0.

@@ -625,7 +625,7 @@ def set_abundances(self, values, clear=True):

for h, v in values.items():
hashes.append(h)
abunds.append(v)
abunds.append(max(0, v))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't cast negative values (signed int) to unsigned int to be passed to the rust:add_hash_with_abundance. Instead, convert the negative abundance values in Python to zero before passing to rust.

@@ -625,7 +625,7 @@ def set_abundances(self, values, clear=True):

for h, v in values.items():
hashes.append(h)
abunds.append(v)
abunds.append(max(0, v))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hot take: I think this behavior is strange! We don't support negative abundances, and probably shouldn't support passing in negative abundances.

How about:

  • 0 deletes hash
  • < 0 raises ValueError
    ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, more reasonable.

Ok, I have a question. Is it useful/supported to change specific abundances without dropping the rest? set_abundance can add to the current values but can't change/reduce.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I don't know 😄 . We aren't really using abundances a lot at the moment so I don't have many good use cases in mind. To me, that suggests that any choice we make is going to be wrong... It's easy enough to add new methods or adjust this one down the road, so let's just make sure whatever you choose is well tested! Unless @luizirber has thoughts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... I was thinking of adding a bool overwrite flag so the user can overwrite some abundance values without clearing the rest, but, honestly, I have no idea about its importance. Let's just wait for a use case 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on waiting for use case

mh.set_abundances({ 1: 5, 2: 3, 3 : 5 })
mh.set_abundances({ 1: 0, 2 : -1 }, clear=False)
assert 1 not in dict(mh.hashes)
assert 2 not in dict(mh.hashes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably also check that the abundance of hash 3 is five, and length of mh.hashes is 1?

@codecov
Copy link

codecov bot commented Jun 7, 2021

Codecov Report

Merging #1575 (4912306) into latest (01de852) will increase coverage by 0.10%.
The diff coverage is 100.00%.

❗ Current head 4912306 differs from pull request most recent head 43d5409. Consider uploading reports for the commit 43d5409 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1575      +/-   ##
==========================================
+ Coverage   80.96%   81.07%   +0.10%     
==========================================
  Files         102      102              
  Lines       10299    10303       +4     
  Branches     1165     1165              
==========================================
+ Hits         8339     8353      +14     
+ Misses       1751     1742       -9     
+ Partials      209      208       -1     
Flag Coverage Δ
python 89.18% <100.00%> (+0.15%) ⬆️
rust 66.60% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/core/src/sketch/minhash.rs 87.23% <100.00%> (+0.01%) ⬆️
src/sourmash/minhash.py 90.52% <100.00%> (ø)
src/sourmash/sourmash_args.py 94.24% <0.00%> (+0.02%) ⬆️
src/sourmash/lca/lca_utils.py 87.90% <0.00%> (+0.09%) ⬆️
src/sourmash/utils.py 78.94% <0.00%> (+1.75%) ⬆️
src/sourmash/sbt_storage.py 87.56% <0.00%> (+4.66%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb723a4...43d5409. Read the comment docs.

@mr-eyes mr-eyes changed the title [WIP] Remove hash from sketch if set_abundance <=0 [MRG] Remove hash from sketch if set_abundance <=0 Jun 7, 2021
Copy link
Contributor

@ctb ctb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! One last change request: could you update the docstring for set_abundances in Python, too?

@ctb
Copy link
Contributor

ctb commented Jun 7, 2021

Also, please correct the title and expand the description at the top of the PR by just a little bit to repeat the information that is currently in the title. Thanks!

@mr-eyes mr-eyes changed the title [MRG] Remove hash from sketch if set_abundance <=0 [MRG] Remove hash from sketch if abund == 0 & handle -ve abund Jun 7, 2021
@ctb
Copy link
Contributor

ctb commented Jun 7, 2021

nitpick: I'm going to change the title to mention set_abundances 😜

@ctb ctb changed the title [MRG] Remove hash from sketch if abund == 0 & handle -ve abund [MRG] Update MinHash.set_abundances to remove hash if 0 abund; handle negative abundances. Jun 7, 2021
tests/test_minhash.py Outdated Show resolved Hide resolved
@ctb ctb merged commit 6b5806c into sourmash-bio:latest Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support deletion of hashes by setting abundance to 0?
3 participants