-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use rng.choice without unpacked in subsample_without_replacement with 64-bit support #935
Conversation
Drastically reduces the memory needs when sums are large. |
Co-authored-by: Daniel McDonald <d3mcdonald@eng.ucsd.edu>
Co-authored-by: Daniel McDonald <d3mcdonald@eng.ucsd.edu>
Co-authored-by: Daniel McDonald <d3mcdonald@eng.ucsd.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could a note be added to the change log with approximate performance boost and memory reduction?
Co-authored-by: Daniel McDonald <d3mcdonald@eng.ucsd.edu>
On EMP-style BIOM, the new without_replacement algorithm is about 2x faster (n=1000, on a EPYC 7302 CPU): There is also a small reduction is memory consumption, but it is barely noticeable compared to the rest of the memory use when using the Table object. For the record, the biom used for testing was mp.90.min25.deblur.withtax.onlytree_ACTUAL_overlap.biom |
For BIOM tables with very large per-column sums, it is an enabler; (But just fixing that would not help... a test showed it would have needed over 15.1 TiB of RAM) The new code has a 2^63 limit, and the memory use is proportional to n, only. The running time is quite fast for small n, but gets slower with high n (on same CPU as above): |
Thanks!! |
No description provided.