Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ARM intrinsics for thumbv6 / thumbv7 #437

Closed
46 tasks
jcsoo opened this issue May 4, 2018 · 30 comments · Fixed by #518 or #557
Closed
46 tasks

Implement ARM intrinsics for thumbv6 / thumbv7 #437

jcsoo opened this issue May 4, 2018 · 30 comments · Fixed by #518 or #557
Labels

Comments

@jcsoo
Copy link

jcsoo commented May 4, 2018

See Implement all x86 vendor intrinsics for more information about implementing intrinsics.

Also see rust-embedded/wg#63 for more discussion.

There are two groups of intrinsics that need to be implemented for thumbv6 / thumbv7.

Core Register Access functions

Documentation of the core register functions:

https://www.keil.com/pack/doc/CMSIS/Core/html/group__Core__Register__gr.html

ARM CMSIS header file:

https://github.com/ARM-software/CMSIS/blob/master/CMSIS/Include/cmsis_armcc.h

CPSID

  • fn disable_fault_irq() // CPSID f
  • fn disable_irq() // CPSID i
  • fn enable_fault_irq() // CPSIE f
  • fn enable_irq() // CPSIE e

MRS

  • fn get_ASPR() -> u32
  • fn get_BASEPRI -> u32
  • fn get_CONTROL() -> u32
  • fn get_FAULTMASK() -> u32
  • fn get_FPSCR() -> u32 // - M4, M7
  • fn get_IPSR() -> u32
  • fn get_MSP() -> u32
  • fn get_PRIMASK() -> u32
  • fn get_PSP() -> u32
  • fn get_xPSR() -> u32

MSR

  • fn set_ASPR(u32)
  • fn set_BASEPRI(u32)
  • fn set_CONTROL(u32)
  • fn set_FAULTMASK(u32)
  • fn set_FPSCR(u32) // - M4, M7
  • fn set_IPSR(u32)
  • fn set_MSP(u32)
  • fn set_PRIMASK(u32)
  • fn set_PSP(u32)
  • fn set_xPSR(u32)

ARM ACLE Intrinsics

The ARM ACLE specification is here:

http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053d/IHI0053D_acle_2_1.pdf

The Clang ARM ACLE header file is here:

https://github.com/llvm-mirror/clang/blob/master/lib/Headers/arm_acle.h

The compiler intrinsics available through LLVM can be found here:

https://github.com/llvm-mirror/llvm/blob/master/include/llvm/IR/IntrinsicsARM.td

8.3 Memory Barriers

  • fn dmb(u32) // Data Memory Barrier
  • fn dsb(u32) // Data Synchronization Barrier
  • fn isb(u32) // Instruction Synchronziation Barrier

8.4 Hints

  • fn wfi() // Wait For Interrupt
  • fn wfe() // Wait for Event
  • fn sev() // Send Global Event
  • fn sevl() // Send Local Event
  • fn yield() // Yield
  • fn dbg(u32) // Debug

8.5 Swap

  • fn swp(u32, *mut u32) // Swap

8.7 NOP

  • fn nop() // No-op

9.2 Miscellaneous data-processing intrinsics

Note: These may have equivalents in core.

  • fn ror(u32, u32) -> u32 // Rotate Right
  • fn clz(u32) -> u32 // Count Leading Zeros
  • fn cls(u32) -> u32 // Count Leading Sign Bits
  • fn rev(u32) -> u32 // Reverse Byte Order
  • fn rev16(u32) -> u32 // Reverse Byte Order (16 bit)
  • fn revsh(u32) -> u32 // Reverse Byte Order Signed (16 bit)
  • fn rbit(u32) -> u32 / Reverse Bits

10.1 Special register intrinsics

  • fn arm_rsr(special_register) -> u32 // Read System Register
  • fn arm_rsrp(special_register) -> *const () // Read System Register Containing Address
  • fn arm_wsr(special_register, u32) // Write System Register
  • fn arm_wsrp(special_register, *const ()) // Write System Register
@gnzlbg
Copy link
Contributor

gnzlbg commented May 4, 2018

@jcsoo Could you propose a module structure that makes sense ?

Currently we have:

  • arm:
    • neon
    • v7
    • v6
  • aarch64:
    • neon
    • crypto
    • v8

but it has evolved in an ad-hoc way (see issue #139 ).

It might be better to have:

  • arm
    • neon
    • acle
    • cmsis
  • armv7
    • neon
    • acle
  • aarch64
    • asimd
    • crypto
    • acle

Having a module hierarchy consistent with the ISA extensions will help with automatic verification later on, where the tools to automatically verify ACLE might be different than the tools to verify neon. It also helps with "cfg! hell" since it allows setting the right cfg flags at the top level "nicely" without having to handle different targets inside each module.

@jcsoo
Copy link
Author

jcsoo commented May 8, 2018

This is a pretty tricky question because there are a couple of different things that are getting mixed together.

NEON is an architecture extension that includes new instructions.

ACLE is a set of C language extensions that provide a standardized way to access architecture features and extensions, including NEON.

CMSIS is a software standard that defines a C language API that is a superset of ACLE. The Core Function / Instruction Header File is what we are really interested in.

So the neon / acle / cmsis organization makes sense but is a little strange because it's really

  • neon
  • stuff in acle that's not in neon
  • stuff in cmsis that's not in acle

There's also the issue that there are a lot of different rust targets that will fit in that hierarchy in different ways.

Much of this has been covered in #139 as you mention.

My thinking on this has evolved as I've learned more about the details of the various ARM architectures. I think the API that we present (the module organization) should be based on the underlying instruction sets and extensions not on the documents that we use for reference. Each instruction set module should have a complete set of intrinsics except for major extensions like NEON which will have their own module.

CMSIS is a bit strange because it's sort of an instruction set extension for the "-M" variants but not officially treated as such. It also has a different style than everything else. Because of that I think it makes sense to add them as an separate module. We could also do the same with the "-A" and "-R" variants but I don't know enough about them to say whether it would be worth it.

So we would have something that roughly matches up with ARM's architecture and instruction set hierarchy:

  • arm32
    • v6
    • v7
    • v8
  • arm64
    • v8
  • arm
    • dsp (the "E" variants, in particular armv7em)
    • neon
    • crypto
    • cmsis (the "M" variants)

@gnzlbg
Copy link
Contributor

gnzlbg commented May 8, 2018

@jcsoo That sounds in general good to me. The only thing I find a bit strange is that the top-level names do not really match with Rust's target_arch, which is what at the end of the day we have to use to expose some of these things to users in std::arch (in combination with target_feature for features that must be enabled at compile-time). Which leads me to the questions:

  • What's the difference between arm and arm32 ?
  • Where do we differentiate between 32-bit neon and 64-bit asimd ?
  • How do we deal with asimd extensions like dotprod and friends ?

@jcsoo
Copy link
Author

jcsoo commented May 8, 2018

What's the difference between arm and arm32 ?
Where do we differentiate between 32-bit neon and 64-bit asimd ?
How do we deal with asimd extensions like dotprod and friends ?

To be honest, I don't know all that much about 64-bit ARM and NEON/SIMD. If they are different enough, then maybe we should eliminate arm and put variants in arm32 and arm64 as appropriate.

The only thing I find a bit strange is that the top-level names do not really match with Rust's target_arch

The problem is that those names themselves are inconsistent and confusing. Here is the list of ARM target_archs that Rust currently supports:

  • arm (armv7 or armv6, depending on the platform and abi)
  • armv5te
  • thumbv6m
  • thumbv7m
  • thumbv7em
  • armv7
  • armv7s
  • aarch64

So there is a mix of instruction sets, architecture versions, and something that is neither.

Because of that, I think we're best off defining the public api using the ARM architecture versions and then documenting how Rust target architectures map to ARM architectures. For implementation we could have some kind of facade if that makes things easier from a #[cfg] standpoint.

@gnzlbg
Copy link
Contributor

gnzlbg commented May 8, 2018

Here is the list of ARM target_archs that Rust currently supports:

Do you know where the full list is? For armv7 we currently must do #[cfg(all(target_arch = "arm", target_feature = "v7"))] because #[cfg(target_arch = "armv7")] was not working, but maybe it is already fixed.

If they are different enough, then maybe we should eliminate arm and put variants in arm32 and arm64 as appropriate.

So right now, arm is arm32, and aarch64 is arm64. For neon and crypto at least, many intrinsics are part of arm32, while some are exclusive to arm64. Also, arm64 re-exports most of the arm32 intrinsics (e.g. v8 exposes v7 intrinsics and so on).

So maybe we should just rename arm to arm32 and aarch64 to arm64 and call it a day:

  • arm32: #[cfg(target_arch = "arm")]
    • v6: ???
    • v7: #[cfg(target_feature = "v7")]
    • neon: #[cfg(target_feature = "neon")]
    • dsp (the "E" variants): ???
    • cmsis (the "M" variants): ???
  • arm64: #[cfg(target_arch = "aarch64")]
    • v8: always available on arm64
    • neon: additional 64-bit neon intrinsics, re-exports some arm32 neon, #[cfg(target_feature = "neon")]
      • dotprod, and others - additional #[cfg(target_feature = "dotprod")] and friends
    • crypto: #[cfg(target_feature = "crypto")]
    • dsp (the "E" variants): ???
    • cmsis (the "M" variants): ???

where I've used ??? to denote the pieces that might need one or more additional target features.

If something like that is fine, I think we are actually pretty close to it already and what we need is better documentation explaining what goes where, but that should become more clear as we add more intrinsics.

Also, all these modules are there only to help us fight cfg!-hell internally. The top-level coresimd::arch module cherry picks from arm32 / arm64 (or both) what it needs for each arch. The important thing is that when an user writes coresimd::arch::{current_arch}::* that it gets only the intrinsics that make sense for the current architecture being --targeted by rustc.

@jcsoo
Copy link
Author

jcsoo commented May 8, 2018

I believe this is where to find the list of all rustc targets, including the arch and available features for each target:

https://github.com/rust-lang/rust/tree/master/src/librustc_target/spec

I was assuming the first element of the target triple (quad?) was the internal "arch" definition, so some of what I wrote earlier might not apply.

@therealprof
Copy link

This is really not quite as clear as it should be and unfortunately it'll become even more messy in the future thanks to ARM taking a cue from Intels SSE book.

In a nutshell there's there're two main instruction sets/encodings/architectures now called aarch32, aarch64 and in addition there're two compressed ones called thumb1 and thumb2 plus a number of extensions like DSP, VFP, TrustZone, Jazelle, Neon... . Those instruction sets are the superset of all available instructions and constantly expanded. Now there're various profiles (ARMv5, ARMv6, ARMv6M, ARMv7, ARMv7M, ARMv8...) requiring the implementation of a subset of various instructions/encoding schemes yet still allowing to implement more or even custom ones.

My idea would be to group them by main instruction set and then gate individual instructions by profile. The extensions are not part of most profiles (exceptions are e.g. ARMv5TE) and can always be added separately so they have their own feature gates.

So maybe a structure like the following makes most sense:

  • arm:
    • aarch32
    • aarch64
    • thumb1
    • thumb2
    • neon
    • dsp
    • ...

NB: The assumption that arm is 32bit is flawed. Not sure whether there are already 64bit only implementations but for some applications (e.g. server and high end mobile) there's a strong desire to drop 32bit altogether.

@gnzlbg
Copy link
Contributor

gnzlbg commented May 11, 2018

We are currently discussing amending the std::arch RFC to require exposing the intrinsics in std::arch::{arch_name}::{feature} instead of just std::arch::{arch_name}, so it might be worth it to come up with a system that matches that.

  • arm
    • aarch32
      • neon, dsp, acle, cmsis, ...
    • aarch64
      • neon (re-exports aarch32::neon), dsp, acle, cmsis, ...
    • thumb1
      • ...
    • thumb2
      • ...

Ideally, the architecture folders would re-export everything relevant for the current target being compiled to, so that from coresimd we can just re-export the arm::{arch} module of the current target and get all available features layered properly.

@therealprof
Copy link

@gnzlbg And feature gates for the various supported profiles? Works for me.

Sill not sure what acle and cmsis means in that context, they're really just C API naming conventions we shouldn't have to care about in a Rust world. If anything they should be implemented separately from the intrinsics.

@gnzlbg
Copy link
Contributor

gnzlbg commented May 11, 2018

And feature gates for the various supported profiles?

When possible yes. This kind of depends on whether LLVM currently supports these or not, and what we want to do about the ones without one feature gate (or a combination thereof), but I am sure we can work sensible things out as we go along (e.g. like defining feature gates that enable multiple features if that is required).

Sill not sure what acle and cmsis means in that context, they're really just C API naming conventions we shouldn't have to care about in a Rust world.

std::arch exposes function signatures, and the RFC that was merged states that the Rust API must match the C API. This makes the C documentation pretty much usable for Rust. Also, all intrinsics in std::arch need to be automatically verified against he C API, so keeping a similar structure to C headers, at least internally, might help with this. It doesn't mean that we must expose this same structure 1:1 to users though.

@therealprof
Copy link

@gnzlbg

std::arch exposes function signatures, and the RFC that was merged states that the Rust API must match the C API.

It's a C API, not the C API.

This makes the C documentation pretty much usable for Rust

I am not so sure about that. Also what we're talking about here is a tiny subset CMSIS-Core which itself is a subset of CMSIS so if anything it should have a more concise name than just CMSIS. It is also rather esoteric; I've worked a lot with CMSIS but never had to use the special intrinsics before.

NB: ARM is describing them as "Intrinsic functions used to generate CPU instructions that are not supported by standard C functions." and I thought the plan was to provide intrinsics for all relevant opcodes anyway in which case this would mean that we have them twice?

@jcsoo
Copy link
Author

jcsoo commented May 11, 2018

I used the reference to CMSIS-CORE because that seemed like the most authoritative source that I could find. It's also listed in the ARM Compiler toolchain Compiler Reference.

As far as I can tell, each intrinsics corresponds with a single instruction with an appropriate flag, for instance __enable_irq() is cpsie i and __disable_irq() is cpsid f. The underlying instruction itself in that example is CPS which has many possible combinations of flags.

We certainly could rename the intrinsics to __cpsie_i() and __cpsid_f() but that would be a change from the C API.

I definitely could believe that you haven't ever needed to use these intrinsics directly, but is it possible that you were either calling higher-level abstractions that use them under the hood or using inline assembly? How else would you implement a critical section?

@therealprof
Copy link

I used the reference to CMSIS-CORE because that seemed like the most authoritative source that I could find. It's also listed in the ARM Compiler toolchain Compiler Reference.

Well, ARM maintains a lot of stuff around their ARM IP, including cross-compiler toolchains and even whole operating systems.

How else would you implement a critical section?

Using inline assembly, same as plenty of other guys (except for ARM mbedOS, of course). cf. https://github.com/zephyrproject-rtos/zephyr/blob/master/arch/arm/core/cpu_idle.S

@gnzlbg
Copy link
Contributor

gnzlbg commented May 11, 2018

I thought the plan was to provide intrinsics for all relevant opcodes anyway in which case this would mean that we have them twice?

At least for std::arch this was not the plan sincestd::arch just exposes the vendor C APIs. This is also why std::arch features can land on stable quickly: there is nothing to discuss if we just expose what the vendor C APIs expose.

If you want to expose intrinsics not part of the vendor C APIs, then core::intrinsics or somewhere else might be more appropriate, and you are going to need RFCs motivating each single intrinsic, discussing their API, usage, etc. std::arch skips all of this by just doing what the vendor C APIs do :/

Using inline assembly,

Note that you can keep doing that on nightly, but AFAICT there is no concrete proposal yet (nor anybody is putting the work) into landing this on stable any time soon.

@therealprof
Copy link

If you want to expose intrinsics not part of the vendor C APIs, then core::intrinsics or somewhere else might be more appropriate, and you are going to need RFCs motivating each single intrinsic, discussing their API, usage, etc. std::arch skips all of this by just doing what the vendor C APIs do :/

That feels weird and wrong on a number of levels. If we really want to do this then you can basically strike neon and dsp from the list because there're only CMSIS Core and ACLE to play with here in terms of C APIs.

@jcsoo
Copy link
Author

jcsoo commented May 11, 2018

How else would you implement a critical section?

Using inline assembly, same as plenty of other guys (except for ARM mbedOS, of course). cf. https://github.com/zephyrproject-rtos/zephyr/blob/master/arch/arm/core/cpu_idle.S

The motivation for including these intrinsics in stdsimd is to allow their eventual usage on stable: rust-embedded/wg#63

Stabilizing the existing asm!() macro is unlikely for many good reasons outlined here: rust-lang/rust#29722 (comment), and designing a new inline assembly syntax will probably take years of work (see this discussion).

@therealprof
Copy link

@jcsoo

I know and fully agree. But exposing some weirdo Intrinsic API loosely based on some existing C APIs only few people use really doesn't fill me with any kind of joy...

@jcsoo
Copy link
Author

jcsoo commented May 11, 2018

I know and fully agree. But exposing some weirdo Intrinsic API loosely based on some existing C APIs only few people use really doesn't fill me with any kind of joy...

These are the canonical C compiler intrinsics, defined by the vendor, and implemented by all of the major open source and commercial C compilers for the platform. I can understand why developers aren't familiar with the intrinsics if they are using inline assembly, but I've tried as hard as possible to find an authoritative (i.e. vendor-provided) source for them. I'm not aware of any alternative intrinsics.

@gnzlbg
Copy link
Contributor

gnzlbg commented May 11, 2018 via email

@therealprof
Copy link

The only C APIs that can be provided in stdsimd

SIMD? Most of the stuff has nothing to do with SIMD at all and are low level building blocks (or not so low-level in case of NOPs) for system level programming. I'd very much prefer providing a good Rust mapping around the native assembler instructions rather than following inconsistent C APIs just "because they exist". Have you looked at the 74 pages of ACLE documentation? Most of it is general blah, then some C-only explanation how to use the provided include files, some grouping explanation and architecture differentiation and a couple of tables. I don't see anything of relevance for a Rust user in there; even with my 20+ years and of C experience and quite some time working with Cortex-M MCUs I didn't even know ACLE existed until this issue and never missed it...

@gnzlbg
Copy link
Contributor

gnzlbg commented May 11, 2018 via email

@therealprof
Copy link

If you don’t need it, don’t use it? I don’t know what’s your point here. If
you prefer inline assembly to intrinsics, submit an RFC for that, push it
until it’s merged, and implement it. But that doesn’t help those who want
to use vendor APIs in Rust.

I'm trying to help getting embedded Rust to stable. This requires a handful of mnemonics to be available as Rust intrinsics and I could not care less about "vendor APIs in Rust" and am not even quite sure what the topic of this issue has to do with that.

@gnzlbg
Copy link
Contributor

gnzlbg commented May 11, 2018 via email

@therealprof
Copy link

@gnzlbg Well, it seems so. I fail to read that from the topic and I also disagree that this is the best way to solve the issue at hand.

@gnzlbg
Copy link
Contributor

gnzlbg commented May 11, 2018 via email

@japaric
Copy link
Member

japaric commented Jun 28, 2018

Regarding the module structure. ARMv8-M is a 32-bit architecture. The product page on the ARM
website says:

The ARMv8-M architecture is a 32-bit architecture. The registers in the register bank, most data
operations, and addresses are 32-bit. The 32-bit linear address provides 4GB of address space,
which is architecturally pre-defined into several regions with different memory attributes. (..)

The ARMv8-M architecture does not support the A32 instruction set.

The instruction set supported by M-profile processors is the T32 instruction set, previously
called the Thumb instruction set. It contains a range of 16-bit and 32-bit instructions. The
ARMv8-M architecture only supports a subset of the T32 instruction set, with mostly 16-bit
instructions.

AFAICT it's just the 32-bit ARMv7-M (Cortex-M) architecture plus some security features (the SMC
instruction?) -- so I don't think it should be under the arm64 / aarch64 module as that gives the
impression that it's a 64-bit architecture or that it contains 64-bit instructions / operations.


The "ARM and Thumb-2 Instruction Set Quick Reference Card" may provide some guidance on
how to structure the modules and on which #[cfg] to use. The "ARM architecture versions" table in
the last page splits ARM versions as follows (showing the most relevant categories only):

  • n. ARM architecture version n and above
  • 5E. ARM v5E, and 6 and above
  • T2. All Thumb-2 versions of ARM v6 and above.
  • 6K. ARMv6K and above for ARM instructions, ARMv7 for Thumb.
  • RM. ARMv7-R and ARMv7-M only.
  • Z. All Security extension versions of ARMv6 and above.

And among the instructions that will have intrinsics in stdsimd (we have functions for these in the
cortex-m crate) we have:

  • CPSID and CPSIE are in category 6.
  • DMB, DSB and ISB are in category 7.
  • SEV, WFE and WFI are in category 6K.
  • SMC is in category Z. (First time I heard about this instruction but I think it's related to
    ARMv8-M)
  • The BASEPRI register (used with the MSR and MRS instructions) is available on ARMv7-M and not
    on ARMv6-M. I don't know if it's available on Cortex-R or ARMv8-M, but depending on the answer it
    could end up in the RM category.

This reference card totally leaves out NEON instructions though.

Other things to consider: rustc supports ARM Cortex-R as of a few weeks ago, and ARM Cortex-M
(covered by the 4 thumb* targets) doesn't support ARM instructions, only Thumb-1 and Thumb-2
instructions, so we probably need some #[cfg] that says supports thumb instructions but not arm
instructions.


Also, there's a silicon bug around the BASEPRI register that affects the Cortex-M7 processor
revision r0p1 (cf. japaric/cortex-m-rtfm#53) so we'll have to apply the silicon errata fix (cf.
https://github.com/japaric/cortex-m/blob/v0.5.2/src/register/basepri.rs#L39-L44) to the BASEPRI
instrinsic. As we ship pre-compiled libcore binaries and the bug compromises memory safety we'll
have to always apply this fix on the thumbv7em-none-eabi and thumbv7em-none-eabih targets.


Is progress on this issue mainly blocked on deciding on a module hierarchy? To me the flat structure from #437 (comment) makes most sense.

@alexcrichton
Copy link
Member

I believe the contents of the module, std::arch::arm::*, are going to be flat (like x86). In that sense I believe this is purely an internal detail of how the intrinsics are organized.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 28, 2018

Yes, like @alexcrichton this is just an internal detail, and the more instructions for more architectures are implemented the more clear it will be come.

The constraints for the internal organization are a mixture of 1) what makes sense from the hardware point-of-view, 2) what makes sense from an LLVM point-of-view, 3) what makes sense for reusing code between arm32, arm64, thumb, ... targets, and 4) what helps us avoid having many cfg(...) flags per intrinsic (ideally we only have those on whole modules)...

As more intrinsics are added, the internal structure will evolve to make our lives easier. So IMO the best way to proceed is to just send PRs with newer intrinsics, newer targets, ci for those, ... and we'll figure it out as we go along. We can use what has been discussed here as a "guide", but as more intrinsics and arm targets are added new constraints will be introduced and we'll need to adapt.

@japaric
Copy link
Member

japaric commented Jul 6, 2018

PR #518 adds these Cortex-M intrinsics

Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Jul 7, 2018
ARM: expose the "mclass" target feature

This let us differentiate, in conditional compilation context, between ARM Cortex-M targets, like
the `thumbv*` targets, and other ARM targets, like the ARM Cortex-A Linux targets.

r? @alexcrichton
cc @gnzlbg
cc rust-lang/stdarch#437
@gnzlbg gnzlbg reopened this Jul 30, 2018
@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 30, 2018

I think we should clarify whether just ACLE, or ACLE+CMSIS, or just CMSIS should be implemented in stdsimd before continuing to add more intrinsics. Some discussion about this happened in the PR that closed this issue: #518

While clarifying this might generate extra work, this work won't be lost because it can be reused for the RFC that stabilizes these intrinsics, which is something that should happen before Rust 2018 anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants