-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Size split the local store #18048
Comments
huonw
added a commit
that referenced
this issue
Feb 12, 2023
This fixes #17065 by having remote cache loads be able to be streamed to disk. In essence, the remote store now has a `load_file` method in addition to `load_bytes`, and thus the caller can decide to download to a file instead. This doesn't make progress towards #18048 (this PR doesn't touch the local store at all), but I think it will help with integrating the remote store with that code: in theory the `File` could be provided in a way that can be part of the "large file pool" directly (and indeed, the decision about whether to download to a file or into memory ties into that). This also does a theoretically unnecessary extra pass over the data (as discussed in #18231) to verify the digest, but I think it'd make sense to do that as a future optimisation, since it'll require refactoring more deeply (down into `sharded_lmdb` and `hashing`, I think) and is best to build on #18153 once that lands.
stuhood
added a commit
that referenced
this issue
Mar 23, 2023
…n sandboxes (#18153) Fixes #18048 by having `store` code now store/load from a different root if the file size is above some threshold. Also fixes #18231 by verifying content while downloading. `materialize_file` now hardlinks (falling back to copying when the store and workdirs are on separate filesystems) directly from this RO on-disk file if the file doesn't need to be mutable. This means that `--no-pantsd` and daemon restarts get the benefits of linking. --------- Co-authored-by: Stu Hood <stuhood@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently, all blobs (no matter their size) are stored in LMDB. But that is not necessarily optimal for larger blobs.
async
friendly, and so streaming large files into and out of LMDB is challenging (leading to issues like The remoting client currently buffers all fetched blobs in memory before storing to LMDB. #17065).To resolve this, the
local::Store
could be split at a size threshold, such that small files continued to be stored in LMDB, but large files were stored directly as content addressed files (in storage similar to our existing immutable inputs storage).This would involve (in no particular order):
Store
to switch between small/large strategies.Store::store_file
(and the correspondinglocal::Store
method) to take a file pathStore::materialize_file
by directly hardlinking where possible.The text was updated successfully, but these errors were encountered: