-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement LLVM's elementwise unordered atomic memory intrinsics #311
Implement LLVM's elementwise unordered atomic memory intrinsics #311
Conversation
One thing I'm unclear on is whether this needs to account for targets that don't support a particular width of atomic operation by e.g. CI seems to be failing with linker errors like |
8af96ab
to
5b5bc58
Compare
The implementations here all look good to me, although I'm presuming this is all pretty experimental so I'm not necessarily reviewing too too closely. As for the CI errors it looks like this is related to possible panics in debug mode, although I also don't really know how they'd arise. I'd recommend doing what CI is doing (running the build then looking at the rlib) and trying to trace back from there to see why panics are included. |
cc also @RalfJung |
5b5bc58
to
dbcc979
Compare
Fixed CI by replacing division with |
Interesting! Was both |
Allows uses of intrinsics of the form llvm.(memcpy|memmove|memset).element.unordered.atomic.* to be linked.
dbcc979
to
997b86d
Compare
Both are definitely necessary. |
Thanks! |
|
||
fn memset_element_unordered_atomic<T>(s: *mut T, c: u8, bytes: usize) | ||
where | ||
T: Copy + From<u8> + Shl<u32, Output = T> + BitOr<T, Output = T>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks odd, seems worth adding a comment explaining how exactly the value x
being written is computed? Seems like you are making assumptions about the trait implementations here.
What is the reason this does not match the normal memset
, which works on *mut u8
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that we must atomically store an entire T
at a time, which would be impossible if we worked byte-at-a-time. Instead, we build up a T
made out of repeated c
s, and atomically store that instead. What I'd really have liked here is T: PrimitiveInteger
, but of course we don't have that.
Fortunately since this is not exported we don't need to worry too much about someone passing something gratuitously weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the spec says that e.g. the setting happens in 4-byte-atomic chunks?
LLVM is allowed to transform adjacent subsequent atomic operations into a single one. Basically what you are doing here is realizing that optimization by hand.
What I'd really have liked here is T: PrimitiveInteger, but of course we don't have that.
Fair. Could you add a comment explaining that?
I also realized this function is private, so we actually know it will be called only with unsigned primitive integer types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the spec says that e.g. the setting happens in 4-byte-atomic chunks?
Yes, writing each complete T
atomically is what distinguishes this from a regular memset
.
LLVM is allowed to transform adjacent subsequent atomic operations into a single one. Basically what you are doing here is realizing that optimization by hand.
Right, but we need to guarantee this happens for correctness, so relying on an optimization wouldn't be appropriate.
|
||
fn memcpy_element_unordered_atomic<T: Copy>(dest: *mut T, src: *const T, bytes: usize) { | ||
unsafe { | ||
let n = unchecked_div(bytes, mem::size_of::<T>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assume that bytes
is a multiple of size_of::<T>
? If not, what happens when there is rounding going on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this guaranteed by LLVM:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you could use https://doc.rust-lang.org/nightly/std/intrinsics/fn.exact_div.html then. Also IMO this is a precondition worth stating in a comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, didn't see that one. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Resolved in #312.)
Bump compiler-builtins - rust-lang/compiler-builtins#306 - rust-lang/compiler-builtins#309 - rust-lang/compiler-builtins#310 - rust-lang/compiler-builtins#311 - rust-lang/compiler-builtins#312 - rust-lang/compiler-builtins#313 - rust-lang/compiler-builtins#315 - rust-lang/compiler-builtins#317 - rust-lang/compiler-builtins#323 - rust-lang/compiler-builtins#324 - rust-lang/compiler-builtins#328 Adds support for backtraces from `__rust_probestack` plus other goodies. r? @alexcrichton
Would it be possible to gate this on LLVM version? System LLVM 7 and LLVM 8 versions don't seem to allow building Rust's standard library for a cmpxchg instructions must be atomic.
%10 = cmpxchg i64* %9, i64 0, i64 0 unordered not_atomic
in function __llvm_memcpy_element_unordered_atomic_8
LLVM ERROR: Broken function found, compilation aborted!
error: could not compile `compiler_builtins`. And LLVM 7 just segfaults, which made rust-lang/rust#70989 confusing to debug (I initially assumed I will likely attempt to find a target that doesn't have this problem, since it seems easier if possible. |
base.max_atomic_width = Some(64);
EDIT: nope, that's not enough... rust-lang/rust#70989 (comment) still segfaulted. EDIT2: yeah okay I completely missed this part (so it inherits 64-bit atomics): let mut base = super::i686_unknown_linux_gnu::target()?; EDIT3: max_atomic_width: Some(32), and I've found through testing that building |
…crum ci: run mir-opt tests on PR CI also as 32-bit (for `EMIT_MIR_FOR_EACH_BIT_WIDTH`). Background: rust-lang#69916 and [`src/test/mir-opt/README.md`](https://github.com/rust-lang/rust/blob/master/src/test/mir-opt/README.md): > By default 32 bit and 64 bit targets use the same dump files, which can be problematic in the presence of pointers in constants or other bit width dependent things. In that case you can add > > ``` > // EMIT_MIR_FOR_EACH_BIT_WIDTH > ``` > > to your test, causing separate files to be generated for 32bit and 64bit systems. However, if you change the output of such a test (intentionally or not), or if you add a test and it varies between 32-bit and 64-bit platforms, you have to run this command (for a x64 linux host): `./x.py test --stage 1 --target x86_64-unknown-linux-gnu --target i686-unknown-linux-gnu --bless src/test/mir-opt` Otherwise, bors trying to merge the PR will fail, since we test 32-bit targets there. But we don't on PR CI, which means there's no way the PR author would know (unless they were burnt by this already and know what to look for). This PR resolves that by running `mir-opt` tests for ~~`i686-unknown-linux-gnu`~~, on PR CI. **EDIT**: switched to `armv5te-unknown-linux-gnueabi` to work around LLVM 7 crashes (see rust-lang/compiler-builtins#311 (comment)), found during testing. cc @rust-lang/wg-mir-opt @rust-lang/infra
Even if we have a work-around, isn't that still a pretty serious bug? |
Opened issue: #356 |
Allows uses of intrinsics of the form llvm.(memcpy|memmove|memset).element.unordered.atomic.* to be linked. Unblocks rust-lang/rust#59155, rust-lang/rust#58599.
These functions are not based on an existing implementation in compiler-rt, because one does not appear to exist. Reviewing their introduction, this appears to be because existing users of these intrinsics are not compiler-rt users. Fortunately the semantics are simple, so I went ahead and implemented them as the trivial unordered-atomic-ification of the corresponding non-atomic operations.
LLVM calls these functions with intptrs rather, so I'm taking a minor liberty by representing them with accurate pointer types. If that's incorrect, it's easy to switch to explicit casts. I'm also assuming that rustc will always generate calls with
usize
for the third argument, which is true for rust-lang/rust#59155.CC @joshlf