Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::arch does not implement some Neon SIMD intrinsics #75373

Closed
Nufflee opened this issue Aug 10, 2020 · 6 comments
Closed

std::arch does not implement some Neon SIMD intrinsics #75373

Nufflee opened this issue Aug 10, 2020 · 6 comments
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) C-feature-request Category: A feature request, i.e: not implemented / a PR. O-AArch64 Armv8-A or later processors in AArch64 mode O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state

Comments

@Nufflee
Copy link

Nufflee commented Aug 10, 2020

I tried this code:

#![feature(stdsimd)]
#![feature(arm_target_feature)]

extern crate core;

use std::arch::arm::*;

#[target_feature(enable = "neon")]
#[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
unsafe fn vmovmaskq_u8(input: uint8x16_t) -> i32 {
    // Example input (half scale):
    // 0x89 FF 1D C0 00 10 99 33

    // Shift out everything but the sign bits
    // 0x01 01 00 01 00 00 01 00
    let high_bits = vreinterpretq_u16_u8(vshrq_n_u8(input, 7));

    // Merge the even lanes together with vsra. The '??' bytes are garbage.
    // vsri could also be used, but it is slightly slower on aarch64.
    // 0x??03 ??02 ??00 ??01
    let paired16 = vreinterpretq_u32_u16(vsraq_n_u16(high_bits, high_bits, 7));
    // Repeat with wider lanes.
    // 0x??????0B ??????04
    let paired32 = vreinterpretq_u64_u32(vsraq_n_u32(paired16, paired16, 14));
    // 0x??????????????4B
    let paired64 = vreinterpretq_u8_u64(vsraq_n_u64(paired32, paired32, 28));
    // Extract the low 8 bits from each lane and join.
    // 0x4B
    return vgetq_lane_u8(paired64, 0) | (vgetq_lane_u8(paired64, 8) << 8);
}

Godbolt: https://godbolt.org/z/1no41v

I expected to see this happen: Intrinsics found without compile errors

Instead, this happened: Compile errors about Neon functions and types not found. I was able to reproduce this same issue on a Raspberry Pi 3 with a ARM v8 CPU running 32-bit Raspbian. Keep in mind that these intrinsics are supported on ARM v7 and some of them, like vreinterpretq_u32_u16 are basically just reinterpret casts.

Edit: upon closer inspection, I realized that most of the intrinsics I am using are not even supported by Rust. I should still be able to use vreinterpretq_u32_u16 because it is a part of the Rust STD.

Meta

rustc --version --verbose:

rustc 1.47.0-nightly (6c8927b0c 2020-07-26)
binary: rustc
commit-hash: 6c8927b0cf80ceee19386026cf9d7fd4fd9d486f
commit-date: 2020-07-26
host: armv7-unknown-linux-gnueabihf
release: 1.47.0-nightly
LLVM version: 10.0
Backtrace

error[E0412]: cannot find type `uint8x16_t` in this scope

  --> <source>:10:31

   |

10 |   unsafe fn vmovmaskq_u8(input: uint8x16_t) -> i32 {

   |                                 ^^^^^^^^^^ help: a struct with a similar name exists: `uint8x4_t`



error[E0425]: cannot find function `vreinterpretq_u16_u8` in this scope

  --> <source>:16:21

   |

16 |     let high_bits = vreinterpretq_u16_u8(vshrq_n_u8(input, 7));

   |                     ^^^^^^^^^^^^^^^^^^^^ not found in this scope



error[E0425]: cannot find function `vshrq_n_u8` in this scope

  --> <source>:16:42

   |

16 |     let high_bits = vreinterpretq_u16_u8(vshrq_n_u8(input, 7));

   |                                          ^^^^^^^^^^ not found in this scope



error[E0425]: cannot find function `vreinterpretq_u32_u16` in this scope

  --> <source>:21:20

   |

21 |     let paired16 = vreinterpretq_u32_u16(vsraq_n_u16(high_bits, high_bits, 7));

   |                    ^^^^^^^^^^^^^^^^^^^^^ not found in this scope



error[E0425]: cannot find function `vsraq_n_u16` in this scope

  --> <source>:21:42

   |

21 |     let paired16 = vreinterpretq_u32_u16(vsraq_n_u16(high_bits, high_bits, 7));

   |                                          ^^^^^^^^^^^ not found in this scope



error[E0425]: cannot find function `vreinterpretq_u64_u32` in this scope

  --> <source>:24:20

   |

24 |     let paired32 = vreinterpretq_u64_u32(vsraq_n_u32(paired16, paired16, 14));

   |                    ^^^^^^^^^^^^^^^^^^^^^ not found in this scope



error[E0425]: cannot find function `vsraq_n_u32` in this scope

  --> <source>:24:42

   |

24 |     let paired32 = vreinterpretq_u64_u32(vsraq_n_u32(paired16, paired16, 14));

   |                                          ^^^^^^^^^^^ not found in this scope



error[E0425]: cannot find function `vreinterpretq_u8_u64` in this scope

  --> <source>:26:20

   |

26 |     let paired64 = vreinterpretq_u8_u64(vsraq_n_u64(paired32, paired32, 28));

   |                    ^^^^^^^^^^^^^^^^^^^^ not found in this scope



error[E0425]: cannot find function `vsraq_n_u64` in this scope

  --> <source>:26:41

   |

26 |     let paired64 = vreinterpretq_u8_u64(vsraq_n_u64(paired32, paired32, 28));

   |                                         ^^^^^^^^^^^ not found in this scope



error[E0425]: cannot find function `vgetq_lane_u8` in this scope

  --> <source>:29:12

   |

29 |     return vgetq_lane_u8(paired64, 0) | (vgetq_lane_u8(paired64, 8) << 8);

   |            ^^^^^^^^^^^^^ not found in this scope



error[E0425]: cannot find function `vgetq_lane_u8` in this scope

  --> <source>:29:42

   |

29 |     return vgetq_lane_u8(paired64, 0) | (vgetq_lane_u8(paired64, 8) << 8);

   |                                          ^^^^^^^^^^^^^ not found in this scope



warning: unused import: `std::arch::arm::*`

 --> <source>:6:5

  |

6 | use std::arch::arm::*;

  |     ^^^^^^^^^^^^^^^^^

  |

  = note: `#[warn(unused_imports)]` on by default



error: aborting due to 11 previous errors; 1 warning emitted



Some errors have detailed explanations: E0412, E0425.

For more information about an error, try `rustc --explain E0412`.

Compiler returned: 1

cc @gnzlbg

@Nufflee Nufflee added the C-bug Category: This is a bug. label Aug 10, 2020
@JayKickliter
Copy link

JayKickliter commented Aug 11, 2020

I just now ran into this same exact problem. I was able to get sse/avx version of my code to compile, but not neon. The docs barely cover this topic at all, so I'm just shooting in the dark.

@Nufflee
Copy link
Author

Nufflee commented Aug 12, 2020

Yep, SSE and AVX works without issues but Neon refuses to work.

@workingjubilee

This comment has been minimized.

@rustbot rustbot added A-SIMD Area: SIMD (Single Instruction Multiple Data) O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state labels Sep 10, 2020
@workingjubilee
Copy link
Member

I had a moment to review this issue today in closer detail.

vreinterpretq_u32_u16 is not present.
vreinterpretq_u32_u8 is.
vgetq_lane_u8 is not present.
vget_lane_u8 is.

In other words, this is not a compiler error, or at least, not the compiler error it is suggested to be. This is merely the absence of very common intrinsics for the ARM platform.

@bluss
Copy link
Member

bluss commented Dec 7, 2020

The "place to go" to implement these instrinsics is rust-lang/stdarch/issues/148

@workingjubilee workingjubilee added C-feature-request Category: A feature request, i.e: not implemented / a PR. and removed C-bug Category: This is a bug. labels Apr 29, 2021
@workingjubilee workingjubilee changed the title Neon SIMD intrinsics not found when compiling on ARM v7l std::arch does not implement some Neon SIMD intrinsics Apr 29, 2021
@workingjubilee workingjubilee added the O-AArch64 Armv8-A or later processors in AArch64 mode label Mar 23, 2022
@workingjubilee
Copy link
Member

workingjubilee commented Mar 23, 2022

This example compiles now. I am assuming all the intrinsics were implemented, and probably any need for further intrinsic requests can go to stdarch? I am closing this. I think it's fine to reopen if you can find anything missing, though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) C-feature-request Category: A feature request, i.e: not implemented / a PR. O-AArch64 Armv8-A or later processors in AArch64 mode O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state
Projects
None yet
Development

No branches or pull requests

5 participants