Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust cache_content_behavior="fetch" to skip loading of data #17448

Closed
stuhood opened this issue Nov 2, 2022 · 0 comments · Fixed by #17495
Closed

Adjust cache_content_behavior="fetch" to skip loading of data #17448

stuhood opened this issue Nov 2, 2022 · 0 comments · Fixed by #17495
Assignees

Comments

@stuhood
Copy link
Member

stuhood commented Nov 2, 2022

As reported as one part of #16825 (comment), cache_content_behavior="fetch" is pretty wasteful when used with a local cache, because it actually loads the content from disk, rather than only validating that it exists.

The cache_content_behavior check lives here: https://github.com/pantsbuild/pants/blob/main/src/rust/engine/process_execution/src/lib.rs#L822-L873 ... and uses ensure_local_has_recursive_directory and ensure_local_has_file to "fetch" content from the remote store to the local store. ensure_local_has_file internally uses load_bytes_with to fetch content... but this is overkill.


To improve this:

  1. ensure_local_has_file should be converted into a batch method (probably: ensure_local_has_files)
  2. ensure_local_has_files should start by using the batch exists check from Implement a batched local lookup for missing fingerprints. #16627 to filter out any Digests which already exist locally.
  3. Then, for content which does not exist locally, a private helper method should be extracted from load_bytes_with for the portion which actually downloads a Digest:
    let remote = maybe_remote.ok_or_else(|| {
    StoreError::MissingDigest("Was not present in the local store".to_owned(), digest)
    })?;
    let remote_store = remote.store.clone();
    remote
    .maybe_download(digest, async move {
    // TODO: Now that we always copy from the remote store to the local store before executing
    // the caller's logic against the local store, `remote::ByteStore::load_bytes_with` no
    // longer needs to accept a function.
    let bytes = retry_call(
    remote_store,
    |remote_store| async move { remote_store.load_bytes_with(digest, Ok).await },
    |err| match err {
    ByteStoreError::Grpc(status) => status_is_retryable(status),
    _ => false,
    },
    )
    .await
    .map_err(|err| match err {
    ByteStoreError::Grpc(status) => status_to_str(status),
    ByteStoreError::Other(msg) => msg,
    })?
    .ok_or_else(|| {
    StoreError::MissingDigest(
    "Was not present in either the local or remote store".to_owned(),
    digest,
    )
    })?;
    f_remote(bytes.clone())?;
    let stored_digest = local.store_bytes(entry_type, None, bytes, true).await?;
    if digest == stored_digest {
    Ok(())
    } else {
    Err(StoreError::Unclassified(format!(
    "CAS gave wrong digest: expected {:?}, got {:?}",
    digest, stored_digest
    )))
    }
    })
    .await?;
    ... that method should then be used in ensure_local_has_files to download the content.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants