-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Materialize "large" files in a new store location and hardlink them in sandboxes #18153
Conversation
WIP due to lots of duplicated code, and because this errors without a clean LMDB store or |
let src = &self.large_file_path(digest); | ||
let mut reader = std::fs::File::open(src) | ||
.map_err(|e| format!("Failed to read {src:?} for digest {digest:?}: {e}"))?; | ||
let mut writer: Vec<u8> = vec![]; | ||
io::copy(&mut reader, &mut writer) | ||
.map_err(|e| format!("Failed to load large file into memory: {e}"))?; | ||
Ok(Some(f(&writer[..]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always need to exist (as opposed to forcing the caller to symlink) chiefly for the case that we're materializing a file that shouldn't be immutable. We'll need to copy no matter what.
Oh hmm |
OH I wonder if we're caching the |
How do I repro the failure without a clean LMDB store?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This fixes #17065 by having remote cache loads be able to be streamed to disk. In essence, the remote store now has a `load_file` method in addition to `load_bytes`, and thus the caller can decide to download to a file instead. This doesn't make progress towards #18048 (this PR doesn't touch the local store at all), but I think it will help with integrating the remote store with that code: in theory the `File` could be provided in a way that can be part of the "large file pool" directly (and indeed, the decision about whether to download to a file or into memory ties into that). This also does a theoretically unnecessary extra pass over the data (as discussed in #18231) to verify the digest, but I think it'd make sense to do that as a future optimisation, since it'll require refactoring more deeply (down into `sharded_lmdb` and `hashing`, I think) and is best to build on #18153 once that lands.
🎉 |
…9711) This (hopefully) optimises storing large blobs to a remote cache, by streaming them directly from the file stored on disk in the "FSDB". This builds on the FSDB local store work (#18153), relying on large objects being stored as an immutable file on disk, in the cache managed by Pants. This is an optimisation in several ways: - Cutting out an extra temporary file: - Previously `Store::store_large_blob_remote` would load the whole blob from the local store and then write that to a temporary file. This was appropriate with LMBD-backed blobs. - With new FSDB, there's already a file that can be used, no need for that temporary, and so the file creation and writing overhead can be eliminated . - Reducing sync IO in async tasks, due to mmap: - Previously `ByteStore::store_buffered` would take that temporary file and mmap it, to be able to slice into `Bytes` more efficiently... except this is secretly blocking/sync IO, happening within async tasks (AIUI: when accessing a mmap'd byte that's only on disk, not yet in memory, the whole OS thread is blocked/descheduled while the OS pulls the relevant part of the file into memory, i.e. `tokio` can't run another task on that thread). - This new approach uses normal `tokio` async IO mechanisms to read the file, and thus hopefully this has higher concurrency. - (This also eliminates the unmaintained `memmap` dependency.) I haven't benchmarked this though. My main motivation for this is firming up the provider API before adding new byte store providers, for #11149. This also resolves some TODOs and even eliminates some `unsafe`, yay! The commits are individually reviewable. Fixes #19049, fixes #14341 (`memmap` removed), closes #17234 (solves the same problem but with an approach that wasn't possible at the time).
…ame device (#19894) As described in #18757, the "check that source and dest are on same device" strategy that was introduced in #18153 to decide whether we could hardlink when materializing files was not robust when faced with the same device being mounted in multiple locations. This change moves to a "create a canary" strategy for deciding when hardlinking between two destinations is legal. Hardlinks are canaried and memoized on a per-destination-root basis, so this strategy might actually be slightly cheaper than the previous one. Fixes #18757.
…ame device (#19894) As described in #18757, the "check that source and dest are on same device" strategy that was introduced in #18153 to decide whether we could hardlink when materializing files was not robust when faced with the same device being mounted in multiple locations. This change moves to a "create a canary" strategy for deciding when hardlinking between two destinations is legal. Hardlinks are canaried and memoized on a per-destination-root basis, so this strategy might actually be slightly cheaper than the previous one. Fixes #18757.
…ame device (pantsbuild#19894) As described in pantsbuild#18757, the "check that source and dest are on same device" strategy that was introduced in pantsbuild#18153 to decide whether we could hardlink when materializing files was not robust when faced with the same device being mounted in multiple locations. This change moves to a "create a canary" strategy for deciding when hardlinking between two destinations is legal. Hardlinks are canaried and memoized on a per-destination-root basis, so this strategy might actually be slightly cheaper than the previous one. Fixes pantsbuild#18757.
…ame device (#19894) As described in #18757, the "check that source and dest are on same device" strategy that was introduced in #18153 to decide whether we could hardlink when materializing files was not robust when faced with the same device being mounted in multiple locations. This change moves to a "create a canary" strategy for deciding when hardlinking between two destinations is legal. Hardlinks are canaried and memoized on a per-destination-root basis, so this strategy might actually be slightly cheaper than the previous one. Fixes #18757.
…ame device (Cherry-pick of #19894) (#19910) As described in #18757, the "check that source and dest are on same device" strategy that was introduced in #18153 to decide whether we could hardlink when materializing files was not robust when faced with the same device being mounted in multiple locations. This change moves to a "create a canary" strategy for deciding when hardlinking between two destinations is legal. Hardlinks are canaried and memoized on a per-destination-root basis, so this strategy might actually be slightly cheaper than the previous one. Fixes #18757. Co-authored-by: Stu Hood <stuhood@gmail.com>
…ame device (Cherry-pick of #19894) (#19914) As described in #18757, the "check that source and dest are on same device" strategy that was introduced in #18153 to decide whether we could hardlink when materializing files was not robust when faced with the same device being mounted in multiple locations. This change moves to a "create a canary" strategy for deciding when hardlinking between two destinations is legal. Hardlinks are canaried and memoized on a per-destination-root basis, so this strategy might actually be slightly cheaper than the previous one. Fixes #18757.
…ts (#20055) As described in #19765, `2.17.x` uses more file handles than previous versions. Based on the location of the reported error, I suspect that this is due to the move from using the LMDB store for all files, to using the filesystem-based store for large files (#18153). In particular: rather than digesting files inside of `spawn_blocking` while capturing them into the LMDB store (imposing the [limit of blocking threads](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.max_blocking_threads) from the tokio runtime), `fn store` moved to digesting them using tokio's async file APIs, which impose no such limit. This change adds a semaphore to (some) file opens to provide a best-effort limit on files opened for the purposes of being captured. It additionally (in the first commit) fixes an extraneous file handle that was being kept open during capture. Fixes #19765.
…ts (#20055) As described in #19765, `2.17.x` uses more file handles than previous versions. Based on the location of the reported error, I suspect that this is due to the move from using the LMDB store for all files, to using the filesystem-based store for large files (#18153). In particular: rather than digesting files inside of `spawn_blocking` while capturing them into the LMDB store (imposing the [limit of blocking threads](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.max_blocking_threads) from the tokio runtime), `fn store` moved to digesting them using tokio's async file APIs, which impose no such limit. This change adds a semaphore to (some) file opens to provide a best-effort limit on files opened for the purposes of being captured. It additionally (in the first commit) fixes an extraneous file handle that was being kept open during capture. Fixes #19765.
…ts (#20055) As described in #19765, `2.17.x` uses more file handles than previous versions. Based on the location of the reported error, I suspect that this is due to the move from using the LMDB store for all files, to using the filesystem-based store for large files (#18153). In particular: rather than digesting files inside of `spawn_blocking` while capturing them into the LMDB store (imposing the [limit of blocking threads](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.max_blocking_threads) from the tokio runtime), `fn store` moved to digesting them using tokio's async file APIs, which impose no such limit. This change adds a semaphore to (some) file opens to provide a best-effort limit on files opened for the purposes of being captured. It additionally (in the first commit) fixes an extraneous file handle that was being kept open during capture. Fixes #19765.
…ts (Cherry-pick of #20055) (#20078) As described in #19765, `2.17.x` uses more file handles than previous versions. Based on the location of the reported error, I suspect that this is due to the move from using the LMDB store for all files, to using the filesystem-based store for large files (#18153). In particular: rather than digesting files inside of `spawn_blocking` while capturing them into the LMDB store (imposing the [limit of blocking threads](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.max_blocking_threads) from the tokio runtime), `fn store` moved to digesting them using tokio's async file APIs, which impose no such limit. This change adds a semaphore to (some) file opens to provide a best-effort limit on files opened for the purposes of being captured. It additionally (in the first commit) fixes an extraneous file handle that was being kept open during capture. Fixes #19765. Co-authored-by: Stu Hood <stuhood@gmail.com>
…ts (Cherry-pick of #20055) (#20077) As described in #19765, `2.17.x` uses more file handles than previous versions. Based on the location of the reported error, I suspect that this is due to the move from using the LMDB store for all files, to using the filesystem-based store for large files (#18153). In particular: rather than digesting files inside of `spawn_blocking` while capturing them into the LMDB store (imposing the [limit of blocking threads](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.max_blocking_threads) from the tokio runtime), `fn store` moved to digesting them using tokio's async file APIs, which impose no such limit. This change adds a semaphore to (some) file opens to provide a best-effort limit on files opened for the purposes of being captured. It additionally (in the first commit) fixes an extraneous file handle that was being kept open during capture. Fixes #19765. Co-authored-by: Stu Hood <stuhood@gmail.com>
Fixes #18048 by having
store
code now store/load from a different root if the file size is above some threshold. Also fixes #18231 by verifying content while downloading.materialize_file
now hardlinks (falling back to copying when the store and workdirs are on separate filesystems) directly from this RO on-disk file if the file doesn't need to be mutable. This means that--no-pantsd
and daemon restarts get the benefits of linking.