Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add CMSIS / Cortex-M instrinsics #518

Merged
merged 5 commits into from
Jul 19, 2018
Merged

add CMSIS / Cortex-M instrinsics #518

merged 5 commits into from
Jul 19, 2018

Conversation

japaric
Copy link
Member

@japaric japaric commented Jul 6, 2018

The implementation uses inline assembly and it's pretty much a copy paste of the CMSIS
implementation. Most of these intrinsics are only available on Cortex-M; this initial
implementation doesn't attempt to make the more general intrinsics available on Cortex-A or
Cortex-R.

Should we attempt to make some of these intrinsics available on Cortex-A and Cortex-R before
landing this PR? CMSIS provides information about Cortex-A; these intrinsics are available on both
Cortex-M and Cortex-A:

  • {enable,disable}_irq
  • NOP
  • WFI
  • WFE
  • SEV
  • ISB
  • DSB
  • DMB

But CMSIS doesn't provide any information about Cortex-R.

This PR depends on the "mclass" target feature which PR rust-lang/rust#52120 adds.

r? @gnzlbg
maybe @andre-richter, @Lakier15, or @paoloteti can tell us which of these intrinsics should also be
available on Cortex-A / Cortex-R?

closes #437

P.S. CMSIS also lists a bunch of DSP instructions but those can be added in a follow up PR. Adding
those intrinsics would require exposing the "dsp" target feature which is enabled only in 1 of the 4
thumb* targets and a few of the Linux targets.

@Amanieu
Copy link
Member

Amanieu commented Jul 7, 2018

Should we attempt to make some of these intrinsics available on Cortex-A and Cortex-R before
landing this PR? CMSIS provides information about Cortex-A; these intrinsics are available on both
Cortex-M and Cortex-A:

These instrinsics are also provided by ACLE, so I don't think that it is necessary to also expose them on Cortex-A/R

@paoloteti
Copy link
Contributor

paoloteti commented Jul 7, 2018

All listed intrinsics works on Cortex-Rx too, but formally are not available on CMSIS-5 yet.
ARM-software/CMSIS_5#314

Instead the most notable Cortex-R CMSIS-DSP implementation is Hercules-DSPLIB

/// Enables IRQ interrupts by clearing the I-bit in the CPSR. Can only be
/// executed in Privileged modes.
#[inline(always)]
pub unsafe fn __enable_irq() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should use the assert_instr macro.

Copy link
Contributor

@gnzlbg gnzlbg Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton these are all in a module guarded by cfg(target_feature = "mclass"), so they don't technically need #[target_feature(enable = "mclass")], but I think we still do always use target_feature(enable) everywhere else in the library, including these cases.

I think I would prefer to have it here as well just in case we end up moving functions around internally in the future.


@japaric is it possible to detect cmsis at run-time? Or is it something that must always be configured statically at compile-time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's probably prudent to go ahead and add the attribute to manually enable it just in case.

}

#[cfg(target_feature = "v7")]
mod v7 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make it more clear to move this to its own file. We have a different v7 module in the arm submodule already, maybe one needs to refactor things a bit.

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@japaric this looks good. There are only two nitpicks:

  • all functions should use the #[target_feature(enable = ...)] attribute even if it should be implied anyways
  • all functions should use the assert_instr attribute to test that the code Rust emits contains the appropriate assembly instruction

I suppose that it is not easy to write run-time tests for this right now, but it would be nice if we could open an issue to explore how we could test these.

@@ -0,0 +1,318 @@
#![allow(non_snake_case)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a top-level comment here linking to the document / documents that specifies these intrinsics?

@japaric
Copy link
Member Author

japaric commented Jul 12, 2018

@gnzlbg done, but as it is the assert_instr macro will never be executed because it has a #[cfg(test)] and the thumb targets don't support cargo test (the test crate doesn't exist for these targets). I could add and use a Cargo feature instead of #[cfg(test)] and it might work. Procedural macros work fine with no_std targets but I don't know what assert_instr does under the hood (if it expands to something that depends on std or test then it won't work).

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this LGTM, but I have never used these, so it would be nice if we could add a link to the specification, and if @Amanieu or someone else familiar with these could review that these follow that properly, since none of these will really be tested for the time being beyond "compiles".


/// Extracted from [CMSIS 5]'s CMSIS/Core/Include/cmsis_gcc.h
///
/// [CMSIS 5]: https://github.com/ARM-software/CMSIS_5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be //! instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer this to be a top-module documentation string (using //!).

/// Enables IRQ interrupts by clearing the I-bit in the CPSR. Can only be
/// executed in Privileged modes.
#[inline]
#[target_feature(enable = "mclass")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am 99% sure that #[target_feature(enable = "mclass")] is the wrong thing to do here. Instead the whole module should be guarded by #[cfg(target_feature = "mclass")].

M-class ARM is very different from A/R-class ARM and these intrinsics only make sense when targeting a M-class architecture. Keep in mind that code compiled for M-class ARM is not binary compatible with A/R-class ARM, so "enabling" the mclass feature does not make any sense.

Copy link
Contributor

@gnzlbg gnzlbg Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead the whole module should be guarded by #[cfg(target_feature = "mclass")].

That is already the case, and you are correct, #[target_feature(enable = "mclass")] isn't technically necessary.

We had issues in the past with the whole binary being compiled with a feature enabled not being 100% equivalent with all the functions in the binary having the feature enabled. These annotations should be redundant if everything works fine, view them as just being extra explicit here (we do the same for all other intrinsics / targets).

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 12, 2018

the assert_instr macro will never be executed because it has a #[cfg(test)] and the thumb targets don't support cargo test (the test crate doesn't exist for these targets).

I don't know exactly which targets have the target-feature mclass enabled: do the other arm / aarch64 targets have these intrinsics? If the answer is yes, then assert_instr should run fine on these. If only the thumb no std targets have the mclass feature, then assert_instr won't do indeed anything.

I don't know what assert_instr does under the hood (if it expands to something that depends on std or test then it won't work).

It adds a new test to the crate that disassembles the exported intrinsic function, and just searches the disassembly for the instruction. So it requires test, backtrace, objdump and a bunch of other std stuff that is probably not available in the thumb targets.

If qemu is able to run these targets, what we could do is just add a couple of crates to the examples directory that exercise these intrinsics, and for those targets, compile and run the examples under qemu, but I would be fine with just filling an issue about this and doing it in a future PR.

@japaric
Copy link
Member Author

japaric commented Jul 14, 2018

Updated the module level doc.

I don't know exactly which targets have the target-feature mclass enabled

Only the thumbv*m targets, which don't have access to std nor test.

do the other arm / aarch64 targets have these intrinsics?

Yes, some intrinsics but this PR only makes these intrinsics available on the thumbv*m targets.

So it requires test, backtrace, objdump and a bunch of other std stuff that is probably not available in the thumb targets.

Neither test or backtrace work nor compile for the thumb targets.

If qemu is able to run these targets

QEMU may not implement the full semantics of these intrinsics; some of these instructions may be interpreted as "nop" in the emulator -- I have seen this behavior before with register access (i.e. invalid memory access doesn't crash / segfault the emulator).

But, @jamesmunns and rust-reach participants are looking into adding test support for the thumb targets (which may include testing on QEMU) to rust-lang/rust. Some of that CI work could be ported to this repo.

@gnzlbg gnzlbg merged commit 0fa99a1 into rust-lang:master Jul 19, 2018
@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 19, 2018

Thanks! I restarted the travis build and everything was green.

@japaric japaric deleted the cmsis branch July 23, 2018 03:12
@japaric
Copy link
Member Author

japaric commented Jul 23, 2018

@gnzlbg @alexcrichton two follow up questions:

  • why don't these intrinsics show up in the nigthly API docs? These intrinsics already made their way into the latest nightly; are the docs just outdated?

  • what needs to happen to stabilize these, or a subset of these, intrinsics?

@alexcrichton
Copy link
Member

@japaric hm they probably don't show up due to the extra #[cfg] gating they're behind. I think though that we could tweak the definition to use #[cfg(any(target_feature = "...", dox))] and get them to show up.

For stabilization it's basically the same as the Intel intrinsics for now, which is to say that it's required to have "vendor specification" for what these intrinsics are (or the equivalent thereof). For Intel intrinsics this was Intel's intrinsics guide, so for these ARM intrinsics we're gonna want the same.

The second optional, but highly desired, piece would be automatic verification of the ARM intrinsics in this repository. Currently there's verification of Intel intrinsics whenever we check them in, but that's not currently implemented for ARM.

That's mostly it! I would personally also like to understand why these intrinsics are feature gated rather than runtime detected (aka defined with #[cfg(...)] instead of runtime detected to see if the instruction is available. Having everything always be available for usage is more in line with the Intel intrinsics, but ARM is quite different so the same story may not be so applicable!

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 23, 2018

@alexcrichton

I would personally also like to understand why these intrinsics are feature gated rather than runtime detected

For arm + v7 and the neon intrinsics, if neon is not enabled at compile-time for the whole binary, compilation fails.

Here, because we are using inline assembly, we might not have that limitation (I am not sure), but that doesn't mean that it is useless to add it. It depends on whether the intrinsics must be enabled at compile-time, or can be detected and enabled at run-time.

This has a lot of consequences. The biggest one is that people can only use these if either their target has them enabled (e.g. the Rust target, or they are using xargo to recompile std themselves), or if they avoid std and use coresimd directly (which can be re-compiled with whatever features the user wants, but because its a git dependency, prevents you from publishing on crates.io).

Or in other words, i don't think it is a big issue to have features that must be enabled statically (we already have pretty big ones, like arm + v7 + neon where v7 and neon must be enabled statically). The problem is, however, that users cannot easily recompile std with these features enabled, and therefore, cannot easily use them without moving to xargo.

@alexcrichton
Copy link
Member

@gnzlbg but do we know why compilation fails? Is it just a bug in LLVM or something fundamental to ARM/the target itself?

@Amanieu
Copy link
Member

Amanieu commented Jul 23, 2018

@alexcrichton The LLVM assembler will restrict itself only to the instructions supported by the currently enabled features. You can enable more features with the .arch asm directive. For example:

.arch armv8+crc
crc32w w0, w1, w2

@alexcrichton
Copy link
Member

Hm ok, so is the style of SIMD on x86 where you cpuid at runtime and compile in multiple functions just entirely not used on ARM at all? Or is it supported in some C compilers and not others?

@japaric
Copy link
Member Author

japaric commented Jul 25, 2018

@alexcrichton

For stabilization it's basically the same as the Intel intrinsics for now, which is to say that it's required to have "vendor specification" for what these intrinsics are (or the equivalent thereof). For Intel intrinsics this was Intel's intrinsics guide, so for these ARM intrinsics we're gonna want the same.

For the CMSIS API added here and in the follow up DSP PRs we have these specifications:

These are referenced in the source code but don't show up in the generated documentation. Should we modify the public doc comments to point to these specifications?

The second optional, but highly desired, piece would be automatic verification of the ARM intrinsics in this repository.

By this do you mean: checking that the signature of the functions we check in matches the CMSIS specification, or checking that the functions generate the instructions they are suppose to generate, or adding both kind of tests?

I'd love to help with such effort but don't have the spare cycles until after the Preview 2 is out.

Hm ok, so is the style of SIMD on x86 where you cpuid at runtime and compile in multiple functions just entirely not used on ARM at all?

Cortex-M and Cortex-A do support CPUID, and I assume Cortex-R does too but couldn't find docs on the topic. I'm not sure if the older ARMv4 and ARMv5 support cpuid; didn't investigate.

But, I expect that most Cortex-M applications would prefer to statically configure everything instead of using a runtime mechanism like CPUID (for code size and, maybe, performance reasons). Cortex-A applications that do use CPUID can't access the Cortex-M specific instructions that have been added in this PR. So, having the Cortex-M instructions available only in a static manner (i.e. depending on the compilation target) seems fine to me.

@paoloteti
Copy link
Contributor

For your information CPUID is available on Cortex-R too. Some extra information is available using the 'feature register 0'. BTW runtime detection is not a good idea on embedded devices.

Please note that DSP intrinsics are available on Cortex-A, but new 'ACLE Q2 2018' specification report: "In general, these intrinsics are aimed at DSP algorithm optimization on M-profile and R-profile. Use on A-profile is deprecated."
Mainly are deprecated in favor of NEON. Even more have been removed (as hw engine) on brand new Cortex-R52 in favor of NEON.

@alexcrichton
Copy link
Member

For the CMSIS API added here and in the follow up DSP PRs we have these specifications:

Perfect!

Should we modify the public doc comments to point to these specifications?

That'd be great yeah! Not required, but it was a last-minute thing we did for all the Intel intrinsics as well, making them a bit easier to browse in the Rust docs.

By this do you mean: checking that the signature of the functions we check in matches the CMSIS specification, or checking that the functions generate the instructions they are suppose to generate, or adding both kind of tests?

Both! The #[assert_instr] macro asserts that the right instruction is generated, and the verification tests ensure that the signature of the function matches the vendor definition.

So, having the Cortex-M instructions available only in a static manner (i.e. depending on the compilation target) seems fine to me.

This may be pretty difficult for the standard library though. We don't support recompiling libstd on the fly so if you're using a precompiled version we are deciding at compile time what set of intrinsics you have access to (which may not be the subset you want).

That's one of the major things that unblocked x86 SIMD where we're shipping a "dumb" libstd that doesn't use AVX but still gives you access to all the AVX goodies. This does sort of depend on LLVM/bugs/etc but it may be a blocker if it's that different from x86

Now if everyone's using xargo anyway, this perhaps isn't a problem!


Overall the next step for stabilization may be an RFC like we had for the x86 intrinsics where the layouts are defined and the way we use the vendor definitions is also defined.

@Amanieu
Copy link
Member

Amanieu commented Jul 26, 2018

This may be pretty difficult for the standard library though. We don't support recompiling libstd on the fly so if you're using a precompiled version we are deciding at compile time what set of intrinsics you have access to (which may not be the subset you want).

The mclass feature is closer to a separate target_arch than a real feature since code compiled for the microcontroller ARMv6-M and ARMv7-M targets is not binary-compatible with code compiled for normal ARM targets. So you will always know at build-time whether you are targeting an mclass CPU or not based on the target: thumbv6 and thumbv7 targets have the mclass feature enabled, all other ARM targets have it disabled.

@japaric
Copy link
Member Author

japaric commented Jul 27, 2018

@alexcrichton

Overall the next step for stabilization may be an RFC like we had for the x86 intrinsics where the layouts are defined and the way we use the vendor definitions is also defined.

The intrinsics that would bring the most benefits by being stabilized first are the non-SIMD ones. These intrinsics would let us drop the build dependency on arm-none-eabi-gcc and external assembly files. Not having to go through FFI also improves performance and code size, as well as the debugging ergonomics.

This subset of intrinsics doesn't require adding any new stable type to arch::arm. And as other have mentioned above it makes sense to make them statically / always available to "mclass" targets so nothing needs to be done on the runtime probing side. We also don't need #[cfg(target_feature = "mclass")] to be stabilized right now; Cortex-M applications don't need it, and code that aims to be portable across Cortex-M and Cortex-A and that uses these intrinsics (which I expect to be rare) can make do with a custom (build.rs) #[cfg] based on the target name.

I'm not quite sure what you meant by "where the layouts are defined", but should I make an RFC that lists these intrinisics and lays out the rationale for stabilizing them ahead of the SIMD ones?

@parched
Copy link

parched commented Jul 27, 2018

I'm not sure it makes sense to have the CMSIS intrinsics in stdsimd. ACLE are the instrinsics Arm recommend compilers implement. The CMSIS instrinsics are more of a wrapper around the various different compilers.

@alexcrichton
Copy link
Member

@japaric sounds great to me! And yeah sorry by "where the layouts are defined" I basically just mean an RFC which covers topics like:

  • What intrinsics are being stabilized (aka where's the vendor's definitions)
  • How intrinsics will be used. For example if they require target features to be enabled and such.
  • Rationale for why the intrinsics are being stablized in the way they're defined (like lack of runtime detection)

Basically just a standard RFC for these intrinsics :)

@parched "stdsimd" is probably a misnomer for this repository in that it's largely intended to do SIMD on x86/x86_64, but it's in general a good location for vendor intrinsics, so in that sense if there's published documentation for what compilers to do that seems reasonable to me to follow!

@alexcrichton
Copy link
Member

It's also worth pointing out that std::arch::x86_64 doesn't necessarily match what C does on x86_64, sometimes exposing lower level primitives like what LLVM has support for directly. We've chosen though to stick to what standards/vendors define as that'll give us compatibility with future LLVM versions (hopefully) as well as alternate backends (again, maybe)

@japaric
Copy link
Member Author

japaric commented Jul 27, 2018

@parched from a cursory look at the documentation you have linked I see that there is a lot of overlap in terms of exposed functionality between CMSIS and ACLE. The main differences I see are:

  1. The naming convention. CMSIS uses __NOP. ACLE uses __nop.
  2. API for register access. CMSIS uses __get_BASEPRI(). ACLE uses __arm_rsr("BASEPRI") where the string is supposed to be a constant. The latter seems harder to replicate in today Rust (cf. cmsis: add __BKPT #540 for discussion about compile-time / constant arguments).
  3. There are few intrinsics that one spec has and the other doesn't. e.g. CMSIS has BKPT; ACLE doesn't. ACLE seems to have more intrinsics that work on older subarchitectures.

Personally, I don't see a reason to not expose the union of ACLE and CMSIS. We just have to sort out points (1) and (2).

@paoloteti
Copy link
Contributor

@japaric

I see that there is a lot of overlap in terms of exposed functionality between CMSIS and ACLE

From my point of view is not a problem of overlap or (minimal) differences on functions signatures.
The big difference is that ACLE dictate what is (or will be) deprecated.
DSP intrinsics on Cortex-A are deprecated by ACLE, for example.

e.g. CMSIS has BKPT; ACLE doesn't

ACLE call it __breakpoint() and ARMC v6 support it for compatibility with ARMC v5 (arm_compat.h). This indicate that will be deprecated in the future (my opition) and there is a rationale on that: with professional debuggers there is no need to inject a breakpoint calling __breakpoint().

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 27, 2018 via email

@paoloteti
Copy link
Contributor

I mean, even if ACLE deprecates stuff, won’t C compilers have to support

Yes, but because there is a pre-existing code-base. That's not true for Rust.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 27, 2018 via email

@japaric
Copy link
Member Author

japaric commented Jul 28, 2018

This might be a dumb question, but can’t we just implement both and let users choose?

The problem I see with literally implementing both is that in some cases there would be two ways (functions) available to do the exact same thing. It could lead to these situtations:

A: I want to use the WFI instruction in my code. Should I use __wfi or __WFI? What's the difference between the two?
B: You can use either. Both functions do the same.
A: Why are there two similarly named functions that do the same?
(...)

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 28, 2018 via email

@paoloteti
Copy link
Contributor

If ARM has two specs that do the same thing, and C compilers implement both, then that’s just how working with ARM at this low level is :/

CMSIS is an HAL spec for Cortex-M (include RTOS api etc), ACLE is a C compiler spec.
A compiler implement ACLE (armcc and gcc for example) and you can build CMSIS on top of ACLE.
A trivial example is cmsis_armcc.h

@parched
Copy link

parched commented Jul 28, 2018

@japaric

The naming convention. CMSIS uses __NOP. ACLE uses __nop.

I guess that is because in the architecture reference manual all the instructions are in capitals, but lowercase is normal C function name style. Personally I prefer just nop which matches the actual instruction but with rust naming style.

API for register access. CMSIS uses __get_BASEPRI(). ACLE uses __arm_rsr("BASEPRI") where the string is supposed to be a constant. The latter seems harder to replicate in today Rust (cf. #540 for discussion about compile-time / constant arguments).

I think that can probably be handled with a small match for the limited set of registers exposed now, and then later when compile time arguments are supported everything can be supported.

@gnzlbg

AFAIK this is an ARM problem, and C compilers and stdsimd implementing both
would just be reflecting the reality of working with the architecture.

The C compilers, in general, don't implement the CMSIS instrinsics (unless by conicidence) they implement ACLE. (e.g., you won't find any reference to CMSIS in clang, ACLE is here)

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 29, 2018

Then I do not follow why is CMSIS being added to stdsimd ? Shouldn't we just be adding ACLE so that CMSIS can be implemented in a normal crate on top of it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement ARM intrinsics for thumbv6 / thumbv7
6 participants