Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Materialize "large" files in a new store location and hardlink them in sandboxes #18153

Merged
merged 38 commits into from
Mar 23, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
2cde378
dont check this in lol
thejcannon Feb 1, 2023
66eb572
it works!
thejcannon Feb 2, 2023
ec76537
bye bye
thejcannon Feb 2, 2023
807f4f5
thats gone too
thejcannon Feb 2, 2023
383de69
thats working better
thejcannon Feb 2, 2023
e0bc8d7
call it immutable
thejcannon Feb 2, 2023
655073a
some test fixes
thejcannon Feb 2, 2023
17d614e
Merge branch 'main' into bigfilestore
thejcannon Feb 13, 2023
b7f6341
tweak
thejcannon Feb 13, 2023
5fc2711
a few changes
thejcannon Feb 14, 2023
052b91a
oh hey its async now
thejcannon Feb 15, 2023
8779e31
remote works for small files too
thejcannon Feb 15, 2023
f14c3c2
more tests pass
thejcannon Feb 15, 2023
7e52590
lets start crackin'
thejcannon Feb 16, 2023
d3e3465
conftest
thejcannon Feb 16, 2023
85adb69
docker is feeling better
thejcannon Feb 16, 2023
4ec4b5d
now we know whats missing
thejcannon Feb 17, 2023
66cf095
xMerge branch 'main' into bigfilestore
thejcannon Feb 17, 2023
564d37a
all_digests
thejcannon Feb 17, 2023
f9f98c1
there we go
thejcannon Feb 17, 2023
55dfe3c
remove too
thejcannon Feb 20, 2023
0b07685
some refactoring
thejcannon Feb 20, 2023
dc01577
more refactoring
thejcannon Feb 20, 2023
7113dae
Merge branch 'main' into bigfilestore
thejcannon Feb 28, 2023
37a6f39
needs some love, but refactored
thejcannon Mar 1, 2023
7127352
remote speedy
thejcannon Mar 1, 2023
43d9c75
* Make `aged_fingerprint` and `all_digests` async, and move their imp…
stuhood Mar 10, 2023
444bec4
Use `self.get_path` in `load_bytes_with`.
stuhood Mar 14, 2023
e52b8b1
Test fixes. Two left.
stuhood Mar 15, 2023
4509c53
Merge branch 'main' into thejcannon/bigfilestore
stuhood Mar 15, 2023
c452708
Re-add digest verification for remote blobs, internally to `remote::B…
stuhood Mar 15, 2023
7f7082d
WIP: Track device.
stuhood Mar 16, 2023
b28e49e
I think were good still gotta test
thejcannon Mar 22, 2023
d19ac09
a few more fixes
thejcannon Mar 22, 2023
23b501a
comments
thejcannon Mar 22, 2023
dd3014a
Super speedy now
thejcannon Mar 22, 2023
941c1b9
Update src/rust/engine/fs/store/src/lib.rs
thejcannon Mar 23, 2023
631cb6f
fmt
thejcannon Mar 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 1 addition & 8 deletions src/rust/engine/fs/fs_util/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -524,7 +524,6 @@ async fn execute(top_match: &clap::ArgMatches) -> Result<(), ExitError> {
destination,
output_digest,
&BTreeSet::new(),
None,
Permissions::Writable,
)
.await?,
Expand All @@ -544,13 +543,7 @@ async fn execute(top_match: &clap::ArgMatches) -> Result<(), ExitError> {
let digest = DirectoryDigest::from_persisted_digest(Digest::new(fingerprint, size_bytes));
Ok(
store
.materialize_directory(
destination,
digest,
&BTreeSet::new(),
None,
Permissions::Writable,
)
.materialize_directory(destination, digest, &BTreeSet::new(), Permissions::Writable)
.await?,
)
}
Expand Down
64 changes: 0 additions & 64 deletions src/rust/engine/fs/store/src/immutable_inputs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,69 +51,6 @@ impl ImmutableInputs {
self.0.workdir.path()
}

/// Returns an absolute Path to immutably consume the given Digest from.
pub(crate) async fn path_for_file(
thejcannon marked this conversation as resolved.
Show resolved Hide resolved
&self,
digest: Digest,
is_executable: bool,
) -> Result<PathBuf, StoreError> {
let cell = self.0.contents.lock().entry(digest).or_default().clone();

// We (might) need to initialize the value.
//
// Because this code executes a side-effect which could be observed elsewhere within this
// process (other threads can observe the contents of the temporary directory), we need to
// ensure that if this method is cancelled (via async Drop), whether the cell has been
// initialized or not stays in sync with whether the side-effect is visible.
//
// Making the initialization "cancellation safe", involves either:
//
// 1. Adding a Drop guard to "undo" the side-effect if we're dropped before we fully
// initialize the cell.
// * This is challenging to do correctly in this case, because the `Drop` guard cannot
// be created until after initialization begins, but cannot be cleared until after the
// cell has been initialized (i.e., after `get_or_try_init` returns).
// 2. Shielding ourselves from cancellation by `spawn`ing a new Task to guarantee that the
// cell initialization always runs to completion.
// * This would work, but would mean that we would finish initializing cells even when
// work was cancelled. Cancellation usually means that the work is no longer necessary,
// and so that could result in a lot of spurious IO (in e.g. warm cache cases which
// never end up actually needing any inputs).
// * An advanced variant of this approach would be to _pause_ work on materializing a
// Digest when demand for it disappeared, and resume the work if another caller
// requested that Digest.
// 3. Using anonymous destination paths, such that multiple attempts to initialize cannot
// collide.
// * This means that although the side-effect is visible, it can never collide.
//
// We take the final approach here currently (for simplicity's sake), but the advanced variant
// of approach 2 might eventually be worthwhile.
cell
.get_or_try_init(async {
let chroot = TempDir::new_in(self.0.workdir.path()).map_err(|e| {
format!(
"Failed to create a temporary directory for materialization of immutable input \
digest {digest:?}: {e}"
)
})?;

let dest = chroot.path().join(digest.hash.to_hex());
self
.0
.store
.materialize_file(dest.clone(), digest, Permissions::ReadOnly, is_executable)
.await?;

// Now that we've successfully initialized the destination, forget the TempDir so that it
// is not cleaned up.
let _ = chroot.into_path();

Ok(dest)
})
.await
.cloned()
}

/// Returns an absolute Path to immutably consume the given Digest from.
pub(crate) async fn path_for_dir(
&self,
Expand Down Expand Up @@ -168,7 +105,6 @@ impl ImmutableInputs {
dest.clone(),
directory_digest,
&BTreeSet::new(),
Some(self),
Permissions::ReadOnly,
)
.await?;
Expand Down
65 changes: 19 additions & 46 deletions src/rust/engine/fs/store/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,9 @@ pub use crate::snapshot_ops::{SnapshotOps, SubsetParams};

use std::collections::{BTreeMap, BTreeSet, HashMap, HashSet};
use std::fmt::{self, Debug, Display};
use std::fs::hard_link;
use std::fs::OpenOptions;
use std::future::Future;
use std::io::{self, Read, Write};
use std::io::{self, Write};
use std::os::unix::fs::{symlink, OpenOptionsExt, PermissionsExt};
use std::path::{Path, PathBuf};
use std::sync::{Arc, Weak};
Expand All @@ -59,6 +58,7 @@ use fs::{
use futures::future::{self, BoxFuture, Either, FutureExt, TryFutureExt};
use grpc_util::prost::MessageExt;
use hashing::Digest;
use local::ByteStore;
use parking_lot::Mutex;
use prost::Message;
use protos::gen::build::bazel::remote::execution::v2 as remexec;
Expand All @@ -73,12 +73,6 @@ const KILOBYTES: usize = 1024;
const MEGABYTES: usize = 1024 * KILOBYTES;
const GIGABYTES: usize = 1024 * MEGABYTES;

/// How big a file must be to become an immutable input symlink.
// NB: These numbers were chosen after micro-benchmarking the code on one machine at the time of
// writing. They were chosen using a rough equation from the microbenchmarks that are optimized
// for somewhere between 2 and 3 uses of the corresponding entry to "break even".
const IMMUTABLE_FILE_SIZE_LIMIT: usize = 512 * KILOBYTES;

mod local;
#[cfg(test)]
pub mod local_tests;
Expand Down Expand Up @@ -440,24 +434,15 @@ impl Store {
///
/// Store a file locally by streaming its contents.
///
pub async fn store_file<F, R>(
stuhood marked this conversation as resolved.
Show resolved Hide resolved
pub async fn store_file(
&self,
initial_lease: bool,
data_is_immutable: bool,
data_provider: F,
) -> Result<Digest, String>
where
R: Read + Debug,
F: Fn() -> Result<R, io::Error> + Send + 'static,
{
src: PathBuf,
) -> Result<Digest, String> {
self
.local
.store(
EntryType::File,
initial_lease,
data_is_immutable,
data_provider,
)
.store(EntryType::File, initial_lease, data_is_immutable, src)
.await
}

Expand Down Expand Up @@ -1203,7 +1188,6 @@ impl Store {
destination: PathBuf,
digest: DirectoryDigest,
mutable_paths: &BTreeSet<RelativePath>,
immutable_inputs: Option<&ImmutableInputs>,
thejcannon marked this conversation as resolved.
Show resolved Hide resolved
perms: Permissions,
) -> Result<(), StoreError> {
// Load the DigestTrie for the digest, and convert it into a mapping between a fully qualified
Expand Down Expand Up @@ -1231,7 +1215,6 @@ impl Store {
false,
&parent_to_child,
&mutable_path_ancestors,
immutable_inputs,
perms,
)
.await
Expand All @@ -1244,7 +1227,6 @@ impl Store {
force_mutable: bool,
parent_to_child: &'a HashMap<PathBuf, Vec<directory::Entry>>,
mutable_paths: &'a BTreeSet<PathBuf>,
immutable_inputs: Option<&'a ImmutableInputs>,
perms: Permissions,
) -> BoxFuture<'a, Result<(), StoreError>> {
let store = self.clone();
Expand Down Expand Up @@ -1278,19 +1260,17 @@ impl Store {
let path = destination.join(child.name().as_ref());
let store = store.clone();
child_futures.push(async move {
let can_be_immutable =
immutable_inputs.is_some() && !mutable_paths.contains(&path) && !force_mutable;
let can_be_immutable = !mutable_paths.contains(&path) && !force_mutable;

match child {
directory::Entry::File(f) => {
store
.materialize_file_maybe_hardlink(
.materialize_file_maybe_symlink(
path,
f.digest(),
perms,
f.is_executable(),
can_be_immutable,
immutable_inputs,
)
.await
}
Expand All @@ -1307,7 +1287,6 @@ impl Store {
mutable_paths.contains(&path),
parent_to_child,
mutable_paths,
immutable_inputs,
perms,
)
.await
Expand All @@ -1334,22 +1313,25 @@ impl Store {
.boxed()
}

async fn materialize_file_maybe_hardlink(
async fn materialize_file_maybe_symlink(
&self,
destination: PathBuf,
digest: Digest,
perms: Permissions,
is_executable: bool,
can_be_immutable: bool,
immutable_inputs: Option<&ImmutableInputs>,
) -> Result<(), StoreError> {
if can_be_immutable && digest.size_bytes > IMMUTABLE_FILE_SIZE_LIMIT {
let dest_path = immutable_inputs
.unwrap()
.path_for_file(digest, is_executable)
.await?;
if can_be_immutable && ByteStore::uses_large_file_store(digest.size_bytes) {
self
.materialize_hardlink(destination, dest_path.to_str().unwrap().to_string())
.materialize_symlink(
destination,
self
.local
.large_file_path(digest)
.to_str()
.unwrap()
.to_string(),
)
.await
} else {
self
Expand Down Expand Up @@ -1401,15 +1383,6 @@ impl Store {
Ok(())
}

pub async fn materialize_hardlink(
&self,
destination: PathBuf,
target: String,
) -> Result<(), StoreError> {
hard_link(target, destination)?;
Ok(())
}

///
/// Returns files sorted by their path.
///
Expand Down
Loading