-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust compilations are not reproducible #30330
Comments
Probably comes down to the use of the default hasher in HashMaps. |
rust-lang/rfcs#689 seems related. We're due to simplify our name mangling scheme in any case. |
For clarification's sake, @glandium, is the problem that the symbol names themselves differ from one run to the next (that sounds a bit terrifying) or that the symbol table of the object file contains the same symbols, but differently ordered from one run to the next? |
I believe this is definitely a problem with the Rust compiler itself. For example: fn main() {
foo::<i32>();
foo::<u32>();
}
fn foo<T>() {
} If I compile it and take a look at the symbols:
The hash printed is different each run of the compiler. The diff of what symbols are defined looks like:
So, to be clear, this is nondeterminism in the Rust compiler itself. The source file does not have to change, nor does the compiler itself have to change. Currently when the same compiler is run on the same source it will produce different results each time. Seems bad! |
How did this happen? We don't explicitly use RNG in the compiler and I don't recall seeing this before. EDIT: Apparently I'm misremembering and |
FWIW, meta-rust (which I maintain) carries a few awful hacks to keep builds a bit more reproducible (https://github.com/jmesmon/meta-rust/pull/33, there is also a patch that futzes with symbol hashing) as bitbake does not like builds changing unexpectedly. |
Well, it seems that the code generating the symbol hash for monomorphized functions is incorporating memory addresses into the hash: // from librustc_trans/trans/monomorphize.rs
let hash;
let s = {
let mut state = SipHasher::new();
hash_id.hash(&mut state);
mono_ty.hash(&mut state);
// ^^^^^^^^^^^^^^^^^^^^^^^^^
// the hash of a ty::Ty is derived from a memory address,
// hash_id above also contains a vector of ty::Ty
hash = format!("h{}", state.finish());
let path = ccx.tcx().map.def_path_from_id(fn_node_id);
exported_name(path, &hash[..])
}; So, no wonder |
On Tue, Feb 09, 2016 at 03:17:27AM -0800, Michael Woerister wrote:
Oh dear :) |
WIP: Implement stable symbol-name generation algorithm. This PR changes the way symbol names are generated by the compiler. The new algorithm reflects the current state of the discussion over at rust-lang/rfcs#689. Once it is done, it will also fix issue #30330. I want to add a test case for that before closing it though. I also want to do some performance tests. The new algorithm does a little more work than the previous one due to various reasons, and it might make sense to adapt it in a way that allows it to be implemented more efficiently. @nikomatsakis: It would be nice if there was a way of finding out if a `DefPath` refers to something in the current crate or in an external one. The information is already there, it's just not accessible at the moment. I'll probably propose some minor changes there, together with some facilities to allow for accessing `DefPaths` without allocating a `Vec` for them. **TODO** - ~~Actually "crate qualify" symbols, as promised in the docs.~~ - ~~Add a test case showing that symbol names are deterministic~~. - Maybe add a test case showing that symbol names are stable against small code changes. ~~One thing that might be interesting to the @rust-lang/compiler team: I've used SipHash exclusively now for generating symbol hashes. Previously it was only used for monomorphizations and the rest of the code used a truncated version on SHA256. Is there any benefit to sticking to SHA? I don't really see one since we only used 64 bits of the digest anyway, but maybe I'm missing something?~~ ==> Just switched things back to SHA-2 for now.
Is this fixed by #32293? |
Indeed confirmed fixed on nightly! |
It is desirable for repeated builds of the same source with the same compiler produces the same object files. That is something that C/C++ compilers usually do (modulo some randomization that can be overcome with e.g. the -frandom-seed flag), and that allows for reproducible builds.
While the machine code that rustc emits is apparently consistent, the symbols it creates aren't. For instance, when building Firefox with the rust bits enabled on Mozilla's try server twice in a row, I get many differences in the symbol list like the following:
rustc should either emit the same symbol names or allow to seed the RNG it uses like gcc allows. (The former would be more appreciated)
Cc @froydnj
The text was updated successfully, but these errors were encountered: