-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement remaining __clz*i2 intrinsics #639
Conversation
Unfortunately that won't work here because LLVM does in fact use this builtin, for example on the
In that case the implementation in compiler-builtins won't be called anyways since LLVM/GCC will just use the Zbb instruction directly. |
Possible generic version: fn leading_zeros<I: Int>(mut x: I) -> usize {
let mut z = I::BITS;
let mut t: I;
let mut shift = I::BITS / 2;
while shift > 1 {
t = x >> shift;
if t != I::ZERO {
z -= shift;
x = t;
}
shift /= 2;
}
t = x >> 1;
let res = if t != I::ZERO { z - 2 } else { z - x.cast() };
res as usize
} LLVM correctly unrolls the loop to get the same result as the existing usize-only version. Which means it doesn't make use of e.g. |
testcrate/tests/misc.rs
Outdated
#[cfg(any(target_pointer_width = "32", target_pointer_width = "64"))] | ||
{ | ||
use compiler_builtins::int::leading_zeros::__clzdi2; | ||
fuzz(N, |x: u64| { | ||
if x == 0 { | ||
return; // undefined value for an intrinsic | ||
} | ||
let lz = x.leading_zeros() as usize; | ||
let lz0 = __clzdi2(x); | ||
if lz0 != lz { | ||
panic!("__clzdi2({}): std: {}, builtins: {}", x, lz, lz0); | ||
} | ||
}); | ||
} | ||
#[cfg(target_pointer_width = "64")] | ||
{ | ||
use compiler_builtins::int::leading_zeros::__clzti2; | ||
fuzz(N, |x: u128| { | ||
if x == 0 { | ||
return; // undefined value for an intrinsic | ||
} | ||
let lz = x.leading_zeros() as usize; | ||
let lz0 = __clzti2(x); | ||
if lz0 != lz { | ||
panic!("__clzti2({}): std: {}, builtins: {}", x, lz, lz0); | ||
} | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is happening on 16 & 32-bit platforms for the gating - is LLVM lowering .leading_zeros()
to this missing symbol? From CI it looks like MSVC is missing ___clzti2
on arm and 32-bit so you'll have to update the gates.
If this logic is a bit more complex than just pointer sizes, it might be better to add a NoSysU128Clz
option to
compiler-builtins/testcrate/build.rs
Lines 5 to 10 in 06db2de
enum Feature { | |
NoSysF128, | |
NoSysF128IntConvert, | |
NoSysF16, | |
NoSysF16F128Convert, | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gated these, and the intrinsics themselves, the same way the compilers do (well, libgcc does so they have to follow along?). Makes no sense to test for intrinsics which aren't ever used.
As for 128 bit on 64-bit msvc - I guess they don't have 128 bit int somehow, according to compiler-rt. I just took "optimized c" attribute off the implementation instead of trying to gate it. Single branch isn't that bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that the gates were on the intrinsics as well. Could you please remove them? The reasons that libgcc
doesn't supply a specific intrinsic on specific platforms may vary (no integers of that type, GCC itself doesn't use them, platform bugs, untested, just not done yet etc). But if we don't have any specific reason not to make them available and there’s a chance that the compilers make use of them, we should just provide them.
Also a lot of the purpose of this library is to fill in for the shortcomings of libgcc, so it doesn't make any sense to copy their limitations :).
well, libgcc does so they have to follow along?
Unfortunately this isn't very accurate, LLVM (and GCC) both pretty frequently emit symbols that aren't available on lesser-known platforms. Rust hits this more often because we make less common types available on all platforms (e.g. u128
, f16
, f128
) where there might not be a C equivalent available.
Not like this specific symbol is likely to be a problem anywhere. But we should avoid unnecessary cfg
(unless it's a arch-specific symbol) because it just means somebody might have to chase them down in the future and wonder why it's disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed all the gate. Let the flood begin!
As for libgcc and their builtins, the reason there is simple: that's how they do it. Basically they have two and only two functions __clzSI2 and __clzDI2 (note uppercase letters). Despite their names, they operate on native word size and double that respectively. These function names (and argument types) get macroed into whatever proper name and argument type is for the platform.
Same is true for many other builtins there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change. I think you might be able to remove some C sources that we are building (mentioned below) but otherwise the implementation looks good to me now.
const { assert!(T::BITS <= 64) }; | ||
if T::BITS >= 64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional nit: the second condition could become == 64
since > 64
is rejected right above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather leave this as is, as it is more future-proof (in case any 128 bit implementations would actually pop up, and whoever applies 128 bit part happens to be both incredibly careless and doesn't test anything...)
More seriously though, result of the expression is a compile-time constant so it doesn't matter at all which operator to use here, and >= is more consistent with 32 bit part down below.
@@ -382,7 +383,6 @@ mod c { | |||
sources.extend(&[ | |||
("__absvti2", "absvti2.c"), | |||
("__addvti3", "addvti3.c"), | |||
("__clzti2", "clzti2.c"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could also remove __clzdi2
and __clzsi2
on lines 349-350 now that these aren't broken. Possibly
Line 577 in 2bf0425
sources.extend(&[("__clzdi2", "clzdi2.c"), ("__clzsi2", "clzsi2.c")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure what's the policy on these. Is it "only apply C replacements for broken/missing functions"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is accurate, part of the goal of this crate is to be able to target some platforms without needing a C toolchain. I think that usually once something gets ported over and tested, it can be removed from the C sources lists.
I guess maybe we only used the C implementations on thumb because ours was broken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Aarch64 CI is down because of new rustc I guess (0/1 asm labels are now off limits instead of just discouraged). This particular thing about numeric labels amazes me so much... This is due to llvm bug, but instead of dealing with that people decided to document it in rust book (so I guess it is a feature now) and make it a compile-time error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's an easy llvm bug to fix unfortunately. That lint should almost certainly be x86-only though, where LLVM was already throwing an error and the lint just makes it a more accurate error. I brought it up.
You can add
#![allow(binary_asm_labels)] // aarch64 has no problems with binary labels
to the top of src/aarch64_linux.rs
just to get that to pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, thanks. @Amanieu still needs to do the final review. The CI error obviously isn't your problem.
Fix coming for the lint btw, probably will be on tomorrow's nightly. Just rebase once it's available rust-lang/rust#127935 |
This one was a bit interesting to figure out.
Firstly, the existing implementation was wrong, as it was working with whatever native word is, whereas
__clzsi2
should be 32-bit specifically (SI is 32-bit exactly). compiler-rt implementation is always 32-bit. Interestingly, existing CI tests didn't catch this because testcrate only tests against compiler-rt when compiled with--no-default-features -F c
(because testcrate defaultsmangled-names
). When tested in this way, testcreate surely enough failed__clzsi2
test on 64 bit target.Even more interestingly, the compiler-rt implementation is also wrong! It is 32-bit implementation but it uses int type for its argument, so on 16-bit platform it couldn't get 32-bit value. I guess compiler-rt is not used much for 16 bit?
Now, libgcc implementation is the correct one. It always uses 16 bit types for HI functions, 32 bit types for SI, 64 bit for DI and 128 bit for TI. But it has its own twist: it only implements two functions out of all possible - one for native word size and one for double that. So on 16 bit it would be HI and SI, on 32 bit SI/DI and on 64 bit DI/TI. Which is why buggy existing implementation didn't caused any dramas: it would never be called as the compilers know it shouldn't even exist (indeed, even trying hard to call it via
__builtin_clz(uint32_t)
results in something akin to_clzdi2(x as u64) - 32
).So I did the same thing as with bswap - simply relegate work to the LLVM and let it spit out proper builtin implementation. i686 and x86-64 versions look great using
bsr
. Riscv64... cannot say if it became better or worse than it was; its a different algorithm, though both are branchless and logarithmic in complexity. One would need to benchmark it like crazy to decide which is better :)I like new version better because of two things:
With this implementation tests are not quite as useful; they can only work as a documentation of sorts since they check the same implementation against itself. Well, they also can check against compiler-rt so there's that.