add vcgez, vcgtz, vclez, vcltz neon instructions #1069

SparrowLii · 2021-03-09T19:05:33Z

All are automatically generated single-parameter comparison instructions. In order to be consistent with the implementation in Clang, some changes have been made to stdarch-gen.

rust-highfive · 2021-03-09T19:05:35Z

r? @Amanieu

(rust-highfive has picked a reviewer for you, use r? to override)

Amanieu · 2021-03-10T03:41:23Z

crates/stdarch-gen/neon.spec

+multi_fn = fixed, c:in_t
+multi_fn = fixed_2, d:in_t
+multi_fn = simd_shr, e:, a, transmute(c)
+multi_fn = simd_xor, transmute(e), transmute(d)


Why not just use simd_ge here?

This is to be consistent with Clang's implementation. The following is the test I did in https://godbolt.org/:

#include <arm_neon.h> int test() { return (int) vcgez_s32; }

And the Output:

define dso_local i32 @test() local_unnamed_addr #0 { ret i32 ptrtoint (<2 x i32> (<2 x i32>)* @vcgez_s32 to i32) } define internal <2 x i32> @vcgez_s32(<2 x i32> %0) #1 { %2 = ashr <2 x i32> %0, <i32 31, i32 31> %3 = xor <2 x i32> %2, <i32 -1, i32 -1> ret <2 x i32> %3 } attributes #0 = { norecurse nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon" } attributes #1 = { alwaysinline norecurse nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="64" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon" }

If you compile with -O0 you will see that Clang actually emits an icmp sge. LLVM optimizations are then turning this into a shift + xor.

The url of godbolt is from here: #148

That usually works, but in this particular case it gives a different result because the IR is not the one generated by Clang directly: it is the IR after LLVM has run optimization passes that expand the icmp eq into shift and xor.

You can use simd_ge in Rust and it will produce the same IR as Clang.

Umm.. That's right. If we use an implementation consistent with -O0, can we ensure that LLVM achieves the same optimization? If so, we should indeed use simd_ge IMO

[Edit] OK, got it

We run the same LLVM passes as Clang (mostly) so rustc will also transform simd_ge into a shift + xor.

Thanks for explanation!

Amanieu · 2021-03-10T03:41:33Z

crates/stdarch-gen/neon.spec

+/// Compare signed less than zero
+name = vcltz
+multi_fn = fixed, b:in_t
+multi_fn = simd_shr, c:in_t, a, transmute(b)


And simd_lt here?

Same as above, the following is my test in https://godbolt.org/:

#include <arm_neon.h> int test() { return (int) vcltz_s32; }

And the Output:

define dso_local i32 @test() local_unnamed_addr #0 { ret i32 ptrtoint (<2 x i32> (<2 x i32>)* @vcltz_s32 to i32) } define internal <2 x i32> @vcltz_s32(<2 x i32> %0) #1 { %2 = ashr <2 x i32> %0, <i32 31, i32 31> ret <2 x i32> %2 } attributes #0 = { norecurse nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon" } attributes #1 = { alwaysinline norecurse nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="64" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon" }

Amanieu · 2021-03-10T05:08:31Z

Can you add ARM versions of these functions?

SparrowLii · 2021-03-10T06:11:31Z

Can you add ARM versions of these functions?

It seems that these instructions are unique to aarch64 and only accept signed parameters. I can't compile the version of arm on godbolt either.
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?search=vcltz

Amanieu · 2021-03-10T06:14:10Z

You are right.

add vcgez, vcgtz, vclez, vcltz neon instructions

99c0d23

rust-highfive assigned Amanieu Mar 9, 2021

SparrowLii force-pushed the vcgez branch from 4ad071a to e6c2cff Compare March 9, 2021 19:41

correct instruction names

9d963cf

SparrowLii force-pushed the vcgez branch from e6c2cff to 9d963cf Compare March 9, 2021 19:48

Amanieu reviewed Mar 10, 2021

View reviewed changes

SparrowLii added 2 commits March 10, 2021 12:22

consist with Clang -O0

7c2eacc

correct instruction names

ee9c42c

Amanieu merged commit fc199fe into rust-lang:master Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add vcgez, vcgtz, vclez, vcltz neon instructions #1069

add vcgez, vcgtz, vclez, vcltz neon instructions #1069

SparrowLii commented Mar 9, 2021

rust-highfive commented Mar 9, 2021

Amanieu Mar 10, 2021

SparrowLii Mar 10, 2021

Amanieu Mar 10, 2021

SparrowLii Mar 10, 2021

Amanieu Mar 10, 2021

SparrowLii Mar 10, 2021 •

edited

Loading

Amanieu Mar 10, 2021

SparrowLii Mar 10, 2021

Amanieu Mar 10, 2021

SparrowLii Mar 10, 2021

Amanieu commented Mar 10, 2021

SparrowLii commented Mar 10, 2021 •

edited

Loading

Amanieu commented Mar 10, 2021

add vcgez, vcgtz, vclez, vcltz neon instructions #1069

add vcgez, vcgtz, vclez, vcltz neon instructions #1069

Conversation

SparrowLii commented Mar 9, 2021

rust-highfive commented Mar 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparrowLii Mar 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Amanieu commented Mar 10, 2021

SparrowLii commented Mar 10, 2021 • edited Loading

Amanieu commented Mar 10, 2021

SparrowLii Mar 10, 2021 •

edited

Loading

SparrowLii commented Mar 10, 2021 •

edited

Loading