-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Add MinHash.intersection(...)
#1474
Conversation
Codecov Report
@@ Coverage Diff @@
## refactor/index_find #1474 +/- ##
=======================================================
- Coverage 94.84% 89.71% -5.13%
=======================================================
Files 96 123 +27
Lines 15730 19464 +3734
Branches 1466 1483 +17
=======================================================
+ Hits 14919 17463 +2544
- Misses 586 1775 +1189
- Partials 225 226 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Hmm, this still doesn't quite work, because it still pulls all the calculations into Python. I merged #1137 into #1392 in find+remove, moving the changes into the This is still wasteful, because (and there is something VERY wrong going on with the memory consumption of #1392...) For this PR, I think a better approach might be a function that returns the size of the intersection and the union of two MHs (while going thru the data only once), because that's the info that #1392 wants now. |
Err, that's what the |
oh. duh. I realized I hadn't changed the intersection code in the SBT. Fixed in 102b504. Curious if this changes performance! (Obviously it is better to implement the counts directly in rust. Will work on that next.) |
ok, encapsulated key calculations in new method |
Yup, getting better! |
From experimentations in #1475 it is possible that intersection is not the main/largest problem, starting to suspect that all the calls to |
hmm. Can you run the benchmark on this branch again when you get a chance? If |
It is not really The clue seems to be the memory consumption figure. What seems to be happening is that the search algorithm changed in a way that is triggering more internal node checks, and if we compare a I'll pull the sunburst plots from #1201 to see what's going on. |
huh! that is super interesting and weird and unintentional :). I'll dig a little bit to see if there's something obvious. Edit: fixed by 57467cd |
🎉 🎉 shall I merge this? |
(or, well, go ahead and merge it :) |
f6ee58f
to
36ae5b7
Compare
I changed a few things in 36ae5b7
|
MinHash.intersection(...)
MinHash.intersection(...)
Note: merge into #1392.
Bring
MinHash.intersection(...)
in from Rust implementation.Question: Do we need to bump the rust/sourmash version?