-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neon #78
Neon #78
Conversation
This is great! If this depends on things that are in nightly, it will be
quite a while before we can include this PR in a release. But I definitely
am interested in getting it in.
I’m very curious to see how a more powerful machine handles it. I’m also
interested to see how the assembly differs between x64 and arm targets.
…On Fri, Sep 24, 2021 at 3:33 PM Henrik Enquist ***@***.***> wrote:
[image: neon_p2comp]
<https://user-images.githubusercontent.com/6504678/134746092-3f433f8a-1c76-4052-a0b3-140dd246ee82.png>
Compared to the scalar version, on a Raspberry Pi 4.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#78 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI2M6TLMETTHPUNH7X4AQ3UDT4DTANCNFSM5EV4RHVQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Yes I would also like to see how this performs on a more powerful cpu. I'm hoping to find something I can borrow, a Mac with the M1 chip for example. The need for nightly to use Neon will probably remain for some time. There is a tracking issue here: rust-lang/rust#48556 The assembly contains the expected instructions, but I haven't compared to the SSE version. I'll do that and add here! |
I tried with the float32 4-point butterfly. SSE:
Neon:
And just for fun, the scalar x86_64:
|
How do cpu features work on arm? Does it just compile a fallback? Or does
it trigger UB like on x86? I ask because it seems like there’s no way to
query for support for new instructions like that
…On Sun, Sep 26, 2021 at 5:09 PM Henrik Enquist ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/neon/neon_utils.rs
<#78 (comment)>:
> +}
+
+// transpose a 2x2 complex matrix given as [x0, x1], [x2, x3]
+// result is [x0, x2], [x1, x3]
+#[inline(always)]
+pub unsafe fn transpose_complex_2x2_f32(left: float32x4_t, right: float32x4_t) -> [float32x4_t; 2] {
+ let temp02 = extract_lo_lo_f32(left, right);
+ let temp13 = extract_hi_hi_f32(left, right);
+ [temp02, temp13]
+}
+
+// Complex multiplication.
+// Each input contains two complex values, which are multiplied in parallel.
+#[inline(always)]
+pub unsafe fn mul_complex_f32(left: float32x4_t, right: float32x4_t) -> float32x4_t {
+ // ARMv8.2-A introduced vcmulq_f32 and vcmlaq_f32 for complex multiplication, these intrinsics are not yet available.
I have looked a bit more at these instructions after writing that. They
are a quite recent addition from (IIRC) ARMv8.4. So there aren't that many
arm chips out there that have them (most are ARMv8.2). It might be better
to leave these functions like they are.
I don't know if anyone is planning on adding them to Rust.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#78 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI2M6WEMSCM3FYBYNJ2X5TUD6Y3ZANCNFSM5EV4RHVQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
It seems to work the same on arm, it exits with a SIGILL when I tell rustc to enable for example v8.2a on a cpu that only supports v8-a. |
Fascinating. Well, at any rate i wouldn't want to add required support for it, if it's that new. Once it stabilizes, I can see doing something like rader's algorithm and avx2 where there's a fallback that doesn't require it. |
The fixes were merged so now the normal nightly compiler can be used. The current state then is that the neon stuff is completely disabled on anything not aarch64. The stable compiler can be used like usual. On aarch64, by default it's also disabled and compiles on stable. But when enabling the Would you be ok with releasing a version that has nightly-only stuff hidden behind a feature like that? I noticed there are quite a few crates that to that, for example rand: https://crates.io/crates/rand |
It seems I was more than a little confused about the VCMUL/VCMLA etc instructions (I blame the messy ARM documentation!). |
I think we should make a release, and default the neon feature to disabled.
Once a stable release has been out for 6months, we can default it to enabled
I’ve already skimmed through it, but I’ll do a more through review in a day
or two
…On Tue, Sep 28, 2021 at 12:16 PM Henrik Enquist ***@***.***> wrote:
It seems I was more than a little confused about the VCMUL/VCMLA etc
instructions (I blame the messy ARM documentation!).
I'm still a little confused, but I think that so far they have only been
included as an optional extra in the Cortex M55 meant for embedded
applications. Not even the big fancy Cortex X1 has them. So probably not
something we should be waiting for! I should probably remove the comment
about them.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#78 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI2M6S25ORBVO7H6J6KJNTUEIIBNANCNFSM5EV4RHVQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I saw that the neon interleaved load and store instructions got added to the stdarch library. These are quite useful and first results were promising. But then I seem to have hit some bug in rustc, see rust-lang/stdarch#1227. Not sure if stdarch was the right place to file the issue, hopefully someone can give some advice. I'm assuming that these instructions will make it into the nightly rust builds quite soon. But I have no idea how easy it will be to get the benches running again. I'm leaning against waiting with this PR until this stuff has been sorted out. This is the branch that uses the new intrinsics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I requested some minor changes, and once they're in I'd be happy to merge.
I updated the name of the feature. I'm satisfied with this PR at this point. I noticed that it's still marked as draft. Do you think it's ready? If so, mark it ready and I'll merge. If you think there's still work to do, no rush. The one remaining review item I have centers around the pattern of writing I noticed that in other places in this PR, there's an intrinsic to load/store data in an interleaved way. do you think that could be applied here? It doesn't need to be done as a part of this PR but it could be a future optimization. |
Very nice! Thanks for merging :) |
Neon on aarch64 will be available on stable rustc from version 1.61!
|
I think that's a good plan. And we can just document that if you want to enable the neon feature, you need rusc 1.61 or newer. I'm thinking about how to document+test this long-term, once we enable it by default. We could say "rustfft 6.2 requires rustc 1.6x if you're on aarch64 with the 'neon' feature enabled, or rustc 1.37 in all other configurations". Or will it be less confusing to just require 1.6x across the board? I don't know of any other features from recent rust versions that I want to use, but maybe from a user experience perspective it'll be easier for people to wrap their head around than requiring multiple versions. We'll need to update our testing script to specifically test rustc 1.61 stable with neon enabled. |
If we want to keep the requirement at 1.37 for everything except aarch64+neon, we could add a simple build script that checks this. Something like this: https://github.com/HEnquist/camilladsp/blob/next100/build.rs |
Now that all the needed intrinsics are available, it's time for some Neon!
This is basically a direct translation of the SSE code.
Running on Cortex A72 (a Raspberry Pi4), I get a speedup of about 50% for f32, and none for f64. The Neon unit of the A72 can only execute a single 128-bit operation at a time. But it can do two f64 operations in parallel, meaning there isn't really any advantage to Neon here. More advanced cores should do better.
To build this, you need a compiler that has this merged: rust-lang/rust#89145
Reason is here: rust-lang/stdarch#1220
Once the latest nightly can be used, I'll add a CI job.