-
-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix immutable inputs DCL bug. #14016
Conversation
Einteresting. The test failure calls BS on me. |
I've unassigned #13899 and released back to the pool since I still don't understand this and my fix continues to be broken in the same way the current code is. |
Re-opened just for some CI debugging, not for review. |
let materialized_chroot = chroot.into_path(); | ||
let digest_str = digest.hash.to_hex(); | ||
let path = self.workdir.path().join(&digest_str); | ||
tokio::fs::rename(&materialized_chroot, &path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it turns out you cannot rename a 555 dir under macOS; thus the perm denied on macOS only in the tests right now. Who knew? I <3 macOS.
Previously we used the double-checked-cell-async crate (See: https://github.com/chrislearn/double-checked-cell-async/blob/46cd3b04eddddbe279282143fe8a936d5854588c/src/lib.rs#L228-L260), which performed the second check of the double check lock with relaxed ordering. The relevant code is snipped below and annotated: ```rust pub struct DoubleCheckedCell<T> { value: UnsafeCell<Option<T>>, initialized: AtomicBool, lock: Mutex<()>, } impl<T> DoubleCheckedCell<T> { pub fn new() -> DoubleCheckedCell<T> { DoubleCheckedCell { value: UnsafeCell::new(None), initialized: AtomicBool::new(false), lock: Mutex::new(()), } } pub async fn get_or_try_init<Fut, E>(&self, init: Fut) -> Result<&T, E> where Fut: Future<Output = Result<T, E>> { // 1.) 1st load & check. if !self.initialized.load(Ordering::Acquire) { // 2.) Lock. let _lock = self.lock.lock().await; // 3.) 2nd load & check. if !self.initialized.load(Ordering::Relaxed) { { // 4.) Critical section. let result = init.await?; let value = unsafe { &mut *self.value.get() }; value.replace(result); } // 5.) Store with lock held. self.initialized.store(true, Ordering::Release); } } let value = unsafe { &*self.value.get() }; Ok(unsafe { value.as_ref().unchecked_unwrap() }) } } ``` Per the C++11 memory model used by Rust (See: https://en.cppreference.com/w/cpp/language/memory_model), this would seem to indicate the second load could be reordered to occur anywhere after the 1st load with acquire ordering and anywhere before the store with released ordering. If that second load was reordered to occur before the lock was acquired, two threads could enter the critical section in serial and the second thread would try to materialize to paths that already were created and marked read-only. Switch to the async-oncecell crate which performs both loads of the double-checked lock with acquire ordering, ensuring they are not re-ordered with respect to the interleaved non-atomics code. Also fixup the materialization process to be atomic. We now cleanup materialization chroots when materialization fails and only move their contents to the destination path if the full materialization has succeeded. Fixes pantsbuild#13899 [ci skip-build-wheels]
Also provide more complete error reporting should the switch to async-oncecell and use of atomic renames to ensure no incomplete materializations does not do the trick. [ci skip-build-wheels]
Ok, reviewers - PTAL. There was a bugfix for macOS perms oddity and I've added more failure debug info in case my DCL bug fix (switch to OnceCell) was wrong. That debug info includes all the ways I could think of getting a perm denied in the wild and should help narrow this down more quickly if the issue persists. |
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for persisting here!
dest: impl AsRef<Path>, | ||
map_rename_err: impl Fn(std::io::Error) -> String, | ||
) -> Result<(), String> { | ||
// If you try to rename a read-only directory (mode 0o555) under masOS you get permission |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If you try to rename a read-only directory (mode 0o555) under masOS you get permission | |
// If you try to rename a read-only directory (mode 0o555) under macOS you get permission |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to save a tree since this is a comment and you've proved its understandable despite typos.
async fn rename_readonly_directory( | ||
src: impl AsRef<Path>, | ||
dest: impl AsRef<Path>, | ||
map_rename_err: impl Fn(std::io::Error) -> String, | ||
) -> Result<(), String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we end up needing to do this for any other materialized directories, we might be able to bake it into materialize_directory
itself, before the root ends up marked readonly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe. The materialize and move case that this use has would need a callback or an expclict parameter for a final dest.
Previously we used the double-checked-cell-async crate (See:
https://github.com/chrislearn/double-checked-cell-async/blob/46cd3b04eddddbe279282143fe8a936d5854588c/src/lib.rs#L228-L260),
which performed the second check of the double check lock with relaxed
ordering. The relevant code is snipped below and annotated:
Per the C++11 memory model used by Rust (See:
https://en.cppreference.com/w/cpp/atomic/memory_order), this would
seem to indicate the second load could be reordered to occur anywhere
after the 1st load with acquire ordering and anywhere before the store
with released ordering. If that second load was reordered to occur
before the lock was acquired, two threads could enter the critical
section in serial and the second thread would try to materialize to
paths that already were created and marked read-only. Switch to the
async-oncecell crate which performs both loads of the double-checked
lock with acquire ordering, ensuring they are not re-ordered with
respect to the interleaved non-atomics code.
Also fixup the materialization process to be atomic. We now cleanup
materialization chroots when materialization fails and only move their
contents to the destination path if the full materialization has
succeeded.
Fixes #13899
[ci skip-build-wheels]