Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] implement a lazy/on-demand Index loading class #1661

Merged
merged 7 commits into from
Jul 13, 2021
Merged

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Jul 11, 2021

This PR implements indexLazyLoadedIndex(...) which takes the location of an index + and a manifest, and only opens the index when signatures are needed. In particular, select calls are performed on the manifest, and then on index load the manifest is used as a picklist to select out the signatures to be loaded.

In brief, this supports low memory tracking of a large index without needing to keep the Index object itself in memory.

(Code originally developed over in #1619)

  • add tests for find, bool, and empty manifest

@codecov
Copy link

codecov bot commented Jul 11, 2021

Codecov Report

Merging #1661 (062ecc0) into latest (97fa790) will increase coverage by 7.46%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1661      +/-   ##
==========================================
+ Coverage   82.22%   89.68%   +7.46%     
==========================================
  Files         113       86      -27     
  Lines       11704     8031    -3673     
  Branches     1478     1483       +5     
==========================================
- Hits         9624     7203    -2421     
+ Misses       1820      568    -1252     
  Partials      260      260              
Flag Coverage Δ
python 89.68% <100.00%> (+0.05%) ⬆️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/index.py 95.54% <100.00%> (+0.38%) ⬆️
src/core/src/index/linear.rs
src/core/src/index/bigsi.rs
src/core/src/signature.rs
src/core/tests/minhash.rs
src/core/src/sketch/hyperloglog/mod.rs
src/core/src/sketch/hyperloglog/estimators.rs
src/core/src/index/sbt/mhbt.rs
src/core/src/index/storage.rs
src/core/src/ffi/cmd/compute.rs
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97fa790...062ecc0. Read the comment docs.

@ctb
Copy link
Contributor Author

ctb commented Jul 11, 2021

Ready for review & merge @sourmash-bio/devs!

Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@ctb ctb merged commit eedf394 into latest Jul 13, 2021
@ctb ctb deleted the add/lazy_load_index branch July 13, 2021 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants