Stream large remote cache downloads directly to disk #18054

huonw · 2023-01-22T04:06:00Z

This fixes #17065 by having remote cache loads be able to be streamed to disk. In essence, the remote store now has a load_file method in addition to load_bytes, and thus the caller can decide to download to a file instead.

This doesn't make progress towards #18048 (this PR doesn't touch the local store at all), but I think it will help with integrating the remote store with that code: in theory the File could be provided in a way that can be part of the "large file pool" directly (and indeed, the decision about whether to download to a file or into memory ties into that).

This also does a theoretically unnecessary extra pass over the data (as discussed in #18231) to verify the digest, but I think it'd make sense to do that as a future optimisation, since it'll require refactoring more deeply (down into sharded_lmdb and hashing, I think) and is best to build #18153 once that lands.

stuhood

Awesome, thanks a lot!

this is a bit of an awkward design, but I wasn't sure of a better one in the presence of retries:

Given the desire to do #18048, I think that this design is great: the large file store will move/hardlink from this file into the local store, which assuming that the temporary directory is chosen well, will mean that this doesn't involve another pass over the data.

I've plumbed through a configuration parameter for the threshold (although not allowed setting it outside Rust), but maybe it should reuse IMMUTABLE_FILE_SIZE_LIMIT, on the assumption that'll tie into
#18048 better?

Yea, I think that tying these constants together and renaming them would make sense to help prepare for #18048. cc @thejcannon

src/rust/engine/fs/store/src/lib.rs

huonw · 2023-01-23T21:57:05Z

Thanks!

Given the desire to do #18048, I think that this design is great: the large file store will move/hardlink from this file into the local store, which assuming that the temporary directory is chosen well, will mean that this doesn't involve another pass over the data.

👍

After reflecting for a few days, I realised that a potentially better API is to move the file vs. bytes decision to the parent by having different functions, especially given the force_in_memory parameter is already kinda this.

  pub async fn load_bytes(&self, digest: Digest) -> Result<Option<Bytes>, String> { ... }
  pub async fn load_file(&self, digest: Digest) -> Result<Option<tokio::fs::File>, String> { ... }

Then the download_digest_to_local caller looks something like

if digest.size_bytes <= IMMUTABLE_FILE_SIZE_LIMIT || f_remote.is_some() {
    let bytes = store.load_bytes(digest).await?;
    if let Some(f_remote) = f_remote { f_remote(bytes.clone()) }
    local_store.store_bytes(...).await?;
} else {
    assert!(f_remote.is_none());
    let file = store.load_file(digest).await?;
    local_store.store(...).await?;
}

Which seems much nicer than the current clunky approach (and avoids the panic!/unreachable!s).

I'll try to find a moment to refactor along these lines over the next day.

This reverts commit 90a62b8.

huonw · 2023-01-25T05:52:53Z

I've got halfway through fixing things up, but ran out of time. I'm on leave for a bit so I'll have to pick this up when I'm back (or someone else can take over, I'm not picky).

stuhood · 2023-02-09T23:39:31Z

I believe that the fetch-depth issue was fixed by some CI config changes on main: might need to rebase. Sorry about that!

…-stream

stuhood

Thanks!

Feel free to merge whenever you're ready =)

stuhood · 2023-02-10T18:12:33Z

src/rust/engine/fs/store/src/remote.rs

      ByteStoreError::Other(msg) => fmt::Display::fmt(msg, f),
    }
  }
 }

 impl std::error::Error for ByteStoreError {}

+/// Places that write the result of a remote `load`
+#[async_trait]
+trait LoadDestination: AsyncWrite + Send + Sync + Unpin + 'static {


Nit: it seems like rather than a reset method, this could potentially be generic in terms of new() -> Self instead, and create a new tempfile or new Vec? I think that that would avoid needing to put this in a Mutex, because you could create it on the stack before the stream attempt.

But in general: I like this interface.

Yeah, mutating to reinitialise via reset is more than a bit awkward, especially with how it forces the Arc<Mutex<>> overhead 😦.

In writing this comment, I discovered that the load_monomorphic call can at least just be &mut dyn LoadDestination, which makes the implementation of load much nicer, and avoids 'infecting' the interface, so I've made that adjustment. (However, due to the retry_call in the body, the current REAPI implementation needs to wrap that &mut in an Arc<Mutex<>> still, so that it can be cloned for each call. I suspect this can be improved, but I assume it'd require adjusting retry_call somehow, and I haven't worked that out yet...)

Having new() -> Self would be a possibility, but I don't think it works with my desired next steps for this code: separating the actually-downloading parts of load_monomorphic into a separate trait to support swapping out the remote store for #17840. Something like:

#[async_trait] pub trait ByteStoreProvider: Sync + Send + 'static { async fn store_bytes(&self, digest: Digest, bytes: ByteSource) -> Result<(), String>; async fn load(&self, digest: Digest, destination: &mut dyn LoadDestination) -> Result<bool, String>; }

That trait is best used behind a trait object (Arc<dyn ByteStoreProvider>), and thus load needs to be object-safe/cannot be generic. That is, it cannot be async fn load<D: LoadDestination>(...) to allow calling D::new() internally.

(Even without that goal, it's nice for compilation time and binary size for the complicated/significant load_monomorphic code to be non-generic so that it's only compiled once, rather than monomorphised for each different LoadDestination parameter it is used with.)

huonw added 3 commits January 22, 2023 12:34

Add helpers for store::remote::ByteStoreError

5f40066

Stream large remote cache downloads directly to disk

188e437

Parameterize chunk size for smaller tests

90a62b8

huonw mentioned this pull request Jan 22, 2023

Simplify remote store load_bytes API #18034

Merged

stuhood approved these changes Jan 23, 2023

View reviewed changes

src/rust/engine/fs/store/src/lib.rs Outdated Show resolved Hide resolved

src/rust/engine/fs/store/src/lib.rs Outdated Show resolved Hide resolved

stuhood requested review from jsirois and thejcannon January 23, 2023 19:38

stuhood added the category:bugfix Bug fixes for released features label Jan 23, 2023

Revert "Parameterize chunk size for smaller tests"

30f5492

This reverts commit 90a62b8.

huonw mentioned this pull request Feb 5, 2023

Materialize "large" files in a new store location and hardlink them in sandboxes #18153

Merged

huonw added 3 commits February 9, 2023 08:09

WIP

db19c7b

Split to load_bytes, load_file

949705b

rerun CI?

ab27e2e

Merge remote-tracking branch 'upstream/main' into bugfix/17065-remote…

6a8edf8

…-stream

stuhood approved these changes Feb 10, 2023

View reviewed changes

huonw mentioned this pull request Feb 12, 2023

Verify digest of remote cache hits while streaming them, not as a separate pass #18231

Closed

Use &mut, reference pantsbuild#18231

655fc1d

huonw mentioned this pull request Feb 12, 2023

The remoting client currently buffers all fetched blobs in memory before storing to LMDB. #17065

Closed

huonw merged commit 7631913 into pantsbuild:main Feb 12, 2023

huonw deleted the bugfix/17065-remote-stream branch February 12, 2023 22:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream large remote cache downloads directly to disk #18054

Stream large remote cache downloads directly to disk #18054

huonw commented Jan 22, 2023 •

edited

Loading

stuhood left a comment

huonw commented Jan 23, 2023

huonw commented Jan 25, 2023

stuhood commented Feb 9, 2023

stuhood left a comment

stuhood Feb 10, 2023

huonw Feb 12, 2023 •

edited

Loading

Stream large remote cache downloads directly to disk #18054

Stream large remote cache downloads directly to disk #18054

Conversation

huonw commented Jan 22, 2023 • edited Loading

stuhood left a comment

Choose a reason for hiding this comment

huonw commented Jan 23, 2023

huonw commented Jan 25, 2023

stuhood commented Feb 9, 2023

stuhood left a comment

Choose a reason for hiding this comment

stuhood Feb 10, 2023

Choose a reason for hiding this comment

huonw Feb 12, 2023 • edited Loading

Choose a reason for hiding this comment

huonw commented Jan 22, 2023 •

edited

Loading

huonw Feb 12, 2023 •

edited

Loading