-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip loading of local cache data when possible #17495
Conversation
9b52ed3
to
31b0543
Compare
async fn download_digest_to_local< | ||
FRemote: Fn(Bytes) -> Result<(), String> + Send + Sync + 'static, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was simply extracted from load_bytes_with
) | ||
})?; | ||
|
||
f_remote(bytes.clone())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this clone seems like wasted performance in some cases, but will leave it for a followup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bytes.clone()
is very cheap, because it is reference counted. But yea, I think you're right that it is no longer necessary here.
/// Ensure that a directory is locally loadable, which will download it from the Remote store as | ||
/// a sideeffect (if one is configured). | ||
/// | ||
pub async fn ensure_local_has_recursive_directory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was inlined into the new function ensure_local_has_files
let missing_file_digests = self | ||
.local | ||
.get_missing_digests(EntryType::File, file_digests) | ||
.await?; | ||
if missing_file_digests.is_empty() { | ||
return Ok(()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is new
.map(|file_digest| self.ensure_local_has_file(file_digest)) | ||
.collect::<Vec<_>>(), | ||
.map(|file_digest| async move { | ||
if let Err(e) = remote | ||
.download_digest_to_local(self.local.clone(), file_digest, EntryType::File, |_| Ok(())) | ||
.await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key change. We now only download, don't load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
src/rust/engine/fs/store/src/lib.rs
Outdated
@@ -229,6 +229,59 @@ impl RemoteStore { | |||
.await | |||
.map(|&()| ()) | |||
} | |||
|
|||
/// Download the digest to the local byte store from this remote store. The function `f_remote` | |||
/// can be used to validate/transform the bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// can be used to validate/transform the bytes. | |
/// can be used to validate the bytes. |
Bytes
is immutable, so the function can only validate.
src/rust/engine/fs/store/src/lib.rs
Outdated
// TODO: Now that we always copy from the remote store to the local store before executing | ||
// the caller's logic against the local store, `remote::ByteStore::load_bytes_with` no | ||
// longer needs to accept a function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a pointer to #17065 here? I have some more ideas that I will likely put on that ticket today.
) | ||
})?; | ||
|
||
f_remote(bytes.clone())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bytes.clone()
is very cheap, because it is reference counted. But yea, I think you're right that it is no longer necessary here.
src/rust/engine/fs/store/src/lib.rs
Outdated
pub async fn ensure_local_has_recursive_directory( | ||
/// Ensure that the files are locally loadable. This will download them from the remote store as | ||
/// a side effect, if one is configured. | ||
pub async fn ensure_local_has_files( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to: ensure_local_has_digests
, ensure_local_has
, ensure_downloaded
, etc.
Somewhat implicit in here is that we also want to guarantee that the Directory
protos have been persisted locally... so this method should probably also call record_digest_trie
on each loaded tree to ensure that they exist locally as well. It's likely a bug that we weren't doing that before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that at the end of callling download_digest_to_local
on every missing file digest, we then also use a future::try_join_all
that calls ensure_directory_digest_persisted
on every directory digest inputted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we then also use a
future::try_join_all
that callsensure_directory_digest_persisted
on every directory digest inputted?
Yea, either at the beginning or the end is fine. It's also possible to use try_join!
to do it concurrently.
But you should be able to use record_digest_trie
rather than ensure_directory_digest_persisted
, because you will have already called load_digest_trie
at the beginning of the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuhood I want to make sure I got it right why it's safe to use record_digest_trie
before we hit the remote store. That function is only about storing the directory entries, which don't need to come from the remote? We only download file entries from the remote?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function is only about storing the directory entries, which don't need to come from the remote? We only download file entries from the remote?
Right: record_digest_trie
only stores the Directory
entries. Unlike files, Directory
entries may be in one of three places: 1. memory, 2. local, 3. remote... record_digest_trie
ensures that they are actually on disk locally, rather than potentially only in memory or remote.
} | ||
directory::Entry::Symlink(_) | directory::Entry::Directory(_) => (), | ||
}); | ||
self.record_digest_trie(trie, true).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuhood this sets initial_lease
to true
. I wasn't sure what that was about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... me neither. I'm pretty sure that we should remove that parameter, but haven't spent the time to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here to explain why we do this (or in the method signature) would be great.
src/rust/engine/fs/store/src/lib.rs
Outdated
// let _ = future::try_join_all( | ||
// directory_digests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, stale code from iterating
} | ||
directory::Entry::Symlink(_) | directory::Entry::Directory(_) => (), | ||
}); | ||
self.record_digest_trie(trie, true).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here to explain why we do this (or in the method signature) would be great.
@stuhood should this be cherry-picked to 2.15? |
I believe so, yea. A 95% performance difference for a cache lookup amounts to a bug. |
…uild#17495) Closes pantsbuild#17448. # Conflicts: # src/rust/engine/fs/store/src/lib.rs
Closes #17448.