Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [internal] fix immutable inputs bug on macOS #15140

Closed
wants to merge 2 commits into from

Conversation

tdyas
Copy link
Contributor

@tdyas tdyas commented Apr 13, 2022

#13899 describes a bug with immutable inputs where there is a failure to move the materialized digest's directory atomically on macOS.

No fix yet, this PR only contains a unit test that reproduces the issue.

[ci skip-build-wheels]

[ci skip-build-wheels]
@tdyas tdyas added the category:internal CI, fixes for not-yet-released features, etc. label Apr 13, 2022
[ci skip-build-wheels]
@jsirois
Copy link
Contributor

jsirois commented Apr 13, 2022

FWIW the test causes me no errors on Linux. I ran this and killed it after 81 successes:

COUNT=0; while ./cargo test -p process_execution -- duplicate_immutable_input_digests; do echo -e "\n\n$((COUNT+=1)): Success\n---"; done

@tdyas
Copy link
Contributor Author

tdyas commented Apr 13, 2022

FWIW the test causes me no errors on Linux. I ran this and killed it after 81 successes:

COUNT=0; while ./cargo test -p process_execution -- duplicate_immutable_input_digests; do echo -e "\n\n$((COUNT+=1)): Success\n---"; done

@jsirois: What do you think about the idea to add a sequence number to the final path? #13899 (comment)

@jsirois
Copy link
Contributor

jsirois commented Apr 13, 2022

@tdyas I'm not sure. What is the error output in the test for you from this block of code?:

"Failed to move materialized immutable input for {digest:?} from {src:?} to \
{dest:?}: {err}\n\
Parent directory (un-writeable parent dir?) metadata: {parent_metadata:?}\n\
Destination directory (collision?) metadata: {existing_metadata:?}\n\
Current immutable check outs (~dup fingerprints / differing sizes?): {contents:?}
",

I need to page this back in more fully, but having to use a sequence number may imply (pending your error output), that our in-process locking regime is broken. If that's the case, it seems to me just using the Pex atomic_directory strategy (which is now even battle tested in multi-threaded / multi-process environments with the switch from POSIX locks to BSD locks: pex-tool/pex#1702) would be better; i.e.: instead of having both mutexes that sometimes fail for unknown reasons and FS locks (the sequence is a variant of this) to back those failures up, we have just 1 locking mechanism, file locks.

@tdyas
Copy link
Contributor Author

tdyas commented Apr 13, 2022

Finished test [unoptimized + debuginfo] target(s) in 0.80s
 Running unittests (/Users/tdyas/TC/pants/src/rust/engine/target/debug/deps/process_execution-5cc3342d608f97ca)

running 1 test
thread 'immutable_inputs::tests::duplicate_immutable_input_digests' panicked at 'called Result::unwrap() on an Err value: "Failed to move materialized immutable input for Digest { hash: Fingerprint, size_bytes: 84 } from "/var/folders/md/0q71p41n0rbgwhnc8npjjzg00000gn/T/.tmpwjLlEZ/inputs/immutable_inputsimu0yo/.tmpAH2dMP" to "/var/folders/md/0q71p41n0rbgwhnc8npjjzg00000gn/T/.tmpwjLlEZ/inputs/immutable_inputsimu0yo/a80c52a2b4be9145efe3a05bbdd619e959cf78abd3b580977f391fcb1d7c54e9": Permission denied (os error 13)\nParent directory (un-writeable parent dir?) metadata: Ok(Ok(Metadata { file_type: FileType(FileType { mode: 16877 }), is_dir: true, is_file: false, permissions: Permissions(FilePermissions { mode: 16877 }), modified: Ok(SystemTime { tv_sec: 1649879843, tv_nsec: 402068213 }), accessed: Ok(SystemTime { tv_sec: 1649879843, tv_nsec: 397900538 }), created: Ok(SystemTime { tv_sec: 1649879843, tv_nsec: 397900538 }), .. }))\nDestination directory (collision?) metadata: Ok(Metadata { file_type: FileType(FileType { mode: 16749 }), is_dir: true, is_file: false, permissions: Permissions(FilePermissions { mode: 16749 }), modified: Ok(SystemTime { tv_sec: 1649879843, tv_nsec: 400388716 }), accessed: Ok(SystemTime { tv_sec: 1649879843, tv_nsec: 400388716 }), created: Ok(SystemTime { tv_sec: 1649879843, tv_nsec: 400388716 }), .. })\nCurrent immutable check outs (~dup fingerprints / differing sizes?): {Digest { hash: Fingerprint, size_bytes: 84 }: OnceCell(None)}\n "', process_execution/src/immutable_inputs.rs:245:58
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
test immutable_inputs::tests::duplicate_immutable_input_digests ... FAILED

failures:

failures:
immutable_inputs::tests::duplicate_immutable_input_digests

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 97 filtered out; finished in 0.06s

error: test failed, to rerun pass '--lib'

@jsirois
Copy link
Contributor

jsirois commented Apr 13, 2022

Ok, yeah. That's what we've been consistently seeing on Mac and Linux with these error paragraphs. That says the parent directory exists, is a dir, and is 755 and the child target directory already exists!, is a dir and is 555; which means the in-memory Mutexing is broken somehow IIUC.

@tdyas
Copy link
Contributor Author

tdyas commented Apr 13, 2022

I'll try a fix with the BSD lock API then.

@jsirois
Copy link
Contributor

jsirois commented Apr 13, 2022

Thanks Tom!

@tdyas
Copy link
Contributor Author

tdyas commented Apr 22, 2022

For my reference, BSD lock is libc::flock in Rust.

@tdyas tdyas closed this Jun 28, 2022
@tdyas tdyas deleted the fix_immutable_inputs_bug branch February 20, 2023 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:internal CI, fixes for not-yet-released features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants