Test embedded targets in rust-lang/rust CI #52

japaric · 2018-02-23T19:09:31Z

Triage

2018-07-02

The extending Rust reach participants mentored by @jamesmunns are working on this. Among their
goals they have:

Adding "it compiles" tests. libcore, or some other crate, compiles for $TARGET
Adding "it links" tests. A minimal embedded program links with different compiler options
(opt-level, LTO, incremental, etc.) and the presence of some symbols is checked (e.g. main).
Adding "it runs" tests. A embedded program is executed in QEMU and completes successfully.
Adding binary size regression tests. The binary size of a program, compiled with opt-level=s, is
tracked over time (per PR) and the CI fails if the binary size of the program regresses.

In their next meeting they'll discuss which of these goals are must have for edition and which ones
are stretch goals, and a potential timeline.

Last year, compilation of the core crate broke for MSP430 / thumbv6m-none-eabi twice due to some changes to libcore source code. That problem could have been avoided if the core crate was compiled for thumbv6m-none-eabi / msp430 as part of rust-lang/rust test suite. Last year there was also quite some breakage in the embedded ecosystem due to the compiler work on incremental compilation and parallel codegen.

We should try to get rust-lang/rust test suite to include tests for embedded target. At the very least core should be compiled for embedded targets.

TODO

~~@japaric will ask the Rust team what kind of embedded targets tests would be possible to add to the rust-lang/rust test suite.~~ We got an "OK, this is possible" from the infra team during the Rust All Hands meeting.

The text was updated successfully, but these errors were encountered:

jamesmunns · 2018-02-27T19:42:56Z

I think this relates to #47.

@jcsoo and I discussed this in the last meeting, I think it comes down to a balance of what is the cost to develop and maintain, and what value it could bring.

Using the terms from my blog post, I would make the following general assessment:

Note: For all of this discussion, I am only considering the Rust Language, and not necessariliy
considering the crates and frameworks (embedded-hal, rtfm) unless otherwise specified. I think
discussing those topics are fit for a post of their own in the future.

CI Build

Can a set of basic Embedded Rust applications still compile on every stable/beta/nightly version of the Rust Compiler?

Effort to develop

I think the effort of development here is decently low. We need to:

Collect samples of Rust applications for various targets, and that exercise various features relevant to Embedded
Define certain criteria for "pass" or "fail". This could include:
- Does it still build and link?
- Are there any new errors or warnings?
- What size are the components (text, data, bss, etc)
- Some other static analysis?
Define an environment that can build and analyze our samples vs our pass fail criteria
- This probably means setting up a docker container, or some kind of script that can be executed in a CI environment
Integrate this form of testing into the rest of the Rust CI infrastructure

Effort to Maintain

Hopefully very low. I would imagine as certain regressions are identified, test cases could be added as necessary.

Value

This has value to help illustrate Rust's commitment to the embedded sphere of development. This would help catch problems that would be experienced immediately by developers when upgrading versions of rustc, and would catch some problems we have experienced in the past with regards to regressions and changing unstable features.

Verdict

We should almost definitely do this in my opinion

Non-Host, Host, and Simulated Host Testing

I don't think this has immediate value. Most *-host testing is verifying that application or driver code is sane, independent of the target. This might be different when discussing the embedded-hal and driver crate ecosystem.

Hardware In the Loop Testing

I imagine this would be used to continuously build sample binaries for selected embedded targets, and verify that they still function correctly when flashed.

Effort to Develop

It would be necessary to do the following:

Select certain reference hardware from different manufacturers
Write Rust applications for each of these reference hardware that exercised some functionality
- Initially, it would be good to support just basic operations, such as math on the target
- Additionally, it would be good to exercise peripherals on each of the targets
Write or adapt a framework for building the code, flashing it to the hardware (via SWD/JTAG), observing behavior (via SWD/JTAG, or via peripheral output like SPI, UART, etc), and evaluating pass/fail criteria
Report this to some central infrastructure (if decentralized)

The items listed above a decently sized, but there are many examples of how to do this out there. I know projects like https://github.com/RIOT-OS/ maintain these kinds of systems already.

A big "cost" here is that this approach requires physical hardware (both the embedded systems, and some kind of test rack to drive and monitor the tests). This is an addition to the Rust Language's testing infrastructure, which is entirely cloud-based at the moment as far as I know.

Effort to Maintain

I expect the effort to maintain to perhaps be prohibitive here. A person will need to be physically located where the hardware is to replace any broken components (boards, power supplies, test rack failure), and troubleshoot any intermittent or persistent failures.

The more advanced the testing is (e.g. with logic analyzers, mocked hardware, etc), the more likely the chance of hardware failure becomes, and the harder this testing is to maintain.

This infrastructure could be decentralized and provided by donors/volunteers willing to host hardware that is relevant to them, however this would increase the development effort to support a decentralized testing system, and would rely on timely responses by specific volunteers for consistent results.

Value

The value here is also very high. We could say with strong confidence that Rust will not break for your platform if it is one of the tested ones. This could tie in to an LTS style guarantee. I have already heard talks about third party companies willing to sell "maintenance guarantees" to companies that use Rust and need a "guaranteed LTS" experience.

ryankurte · 2018-02-27T23:19:56Z

This infrastructure would be super neat outside of the rust ecosystem too, I'd love to have shared / distributed hardware resources for driver and hardware testing (hardware ci is something I've been dreaming of for a while now)

Once concept we've been discussing recently is dev kits or peripherals paired with an RPi that manages tests and provides standard peripherals to test against / use in testing.
Then there'd be a runner on the device which advertises capabilities to and queries for jobs from a queue service, spins up containers with appropriate peripherals mapped to the container space and runs the jobs, then fires the outputs back to the queue service.
Jobs would be a travis-ci like config paired with a pre-compiled assets (to be language independent / reduce needless load on hwci components), and we could have a repo for runner and hardware configurations to make it easy to repeat and understand physical setups.
Then a GitHub/GitLab/whatever integrations would then interact with the queue service for scheduling etc.
It's another thing I'd be interested in working on, but, there'd need to be some kind of team / commitment to supporting it (and tbqh I wouldn't choose to do it in rust).

For now, gitlab-ci is probably worth a look, the runner does everything above already and it can be configured to do something like this. It's probably not exactly what we're looking for to work outside just one project, but could be enough for now to mirror things to a gitlab organisation and run tests with tagged workers via that.

jamesmunns · 2018-02-28T00:34:02Z

Hey @ryankurte, I'm actually familiar with GitLab CI, as I use it at my current work.

I know groups like RIOT-OS have tools that do this (See https://github.com/RIOT-OS/murdock-scripts), and I have personally developed non-distributed systems in the past that do what you describe: Integrate some kind of host with some kind of embedded client (either a dev board, unit under test, etc), usually with some additional hardware (sometimes another dev board, or logic analyzer, FT2232H, etc) to read state. Unfortunately none of the tools I have developed in the past have been open source, so I don't have much to share other than my experiences.

Its a solvable problem, but for Rust specifically, the biggest issue is "who stores and maintains the hardware"? Especially if the CI tests are important enough to prevent releases when regressions occur? The second biggest issue would be "How do we integrate any testing we develop into the main RustLang CI process"?

Please do stay in touch! Just because Hardware in the Loop testing isn't something we necessarily want to tackle today, doesn't mean we wont in the future! If we close this issue with just CI (no hardware) testing, I'll make sure we open up a follow on issue to keep this on the record.

jamesmunns · 2018-02-28T00:37:48Z

Also paging @jcsoo since he has expertise in this area.

dvc94ch · 2018-02-28T07:14:39Z

What about CI systems developed specifically for linux distributions? fedora, nixos, guix all have built their own CI's specifically for testing the distribution on multiple hardware platforms. Maybe they are too distribution / linux specific to be adapted to generic / embedded targets, I don't know.

jcsoo · 2018-02-28T23:20:39Z

If you just want to plug in dev boards and load stuff onto them as-is, you need to have someone willing to set up and host all of the host systems, CI infrastructure, and physical hardware, not to mention setting up an isolated network and VPN for others to access it. This is all running in someone's office or lab, so it's not a case of running scripts in AWS. Someone will also have to debug these systems as well as the hardware if things go wrong, so they will need a fairly broad base of experience.

Once you get past basic testing of the handful of peripherals on these dev boards (some of which may have no on-board peripherals at all), you need to build actual systems on top of these boards, and these systems need to be reliable and reproducible which means that you don't want breadboarded prototypes. You also need external systems to generate inputs and measure outputs, especially if you are working on a network stack (USB, CAN, Ethernet, Serial, Bluetooth).

So, running an embedded CI lab is enough work and expense that I don't think anyone will do it on a pure volunteer basis. Certainly individuals or groups that have strong incentives to get things tested will put in the effort, but I'm not sure that this WG or even the Rust language organization is quite that motivated.

We might be better off with a more distributed approach. Dev boards are generally not that expensive, and setting up a Raspberry PI is not too difficult for individuals. I think it would be very useful to have a set of common dev boards from a variety of vendors that we can use as "reference" boards, as well as a set of peripherals of various types that can easily be connected to these boards for testing.

It's not important that every developer has every board, and this isn't meant to restrict the environments where Rust embedded development will happen; the intent is to make it a bit more likely that developers overlap in what they build and test on. Bug reports that can be reproduced by at least one other person are a lot more likely to be useful.

ryankurte · 2018-03-01T01:29:38Z

So for the just CI option as mentioned in your post @jamesmunns, are we really just taking about architecture level like msp430 or cortex-m4?

Because in that case could we create a generic project with some basic internal tests (can we add stuff, do we have working atomics etc., whatever else is important) and compile and run them for each architecture in qemu without a whole lot of work?

luser · 2018-04-24T15:01:08Z

Several years ago we embarked on a project at Mozilla to use PandaBoards (a TI ARM development board) for Android and Firefox OS testing. The end result of that was mozpool.

We had at one point several hundred PandaBoards installed in custom-built rackmount chassis in a datacenter and were running tests on them in Firefox CI. There's a bunch of extra complexity in mozpool because we didn't want to have a 1:1 host machine to PandaBoard ratio, so we figured out how to get them to PXE boot into a minimal Linux environment from which they could flash an Android or Firefox OS image. For most microcontrollers I suspect it'd be simpler to just have the device connected via USB to the host machine.

I don't know that anything in mozpool is directly usable for this effort, but the code there was used successfully in production CI, so if nothing else there may be some useful design lessons to be learned.

That effort was abandoned partially because we abandoned Firefox OS and partially because testing Android against a dev board doesn't provide much value in reality (because actual Android users are using vastly different hardware). We now mostly run Android tests in emulators, but also we run a subset of tests on real phone hardware using a project called autophone. I'm pretty sure the mozpool phones mostly live at a remote employee's house, and there's definitely some care and feeding required to keep things running smoothly.

Callek · 2018-04-24T15:18:58Z

I should note that mozpool was overdesigned and overcomplex for what we ended up needing for Android CI. And poved to be pretty cumbersome due to the fault levels of the boards in question and the implemented solutions frequently involved humans to touch and reflash them by hand. (even though not all flashing was needed to be done by humans)

For some further reading:

japaric · 2018-07-03T05:42:20Z

Triage:

The extending Rust reach participants mentored by @jamesmunns are working on this. Among their
goals they have:

Adding "it compiles" tests. libcore, or some other crate, compiles for $TARGET
Adding "it links" tests. A minimal embedded program links with different compiler options
(opt-level, LTO, incremental, etc.) and the presence of some symbols is checked (e.g. main).
Adding "it runs" tests. A embedded program is executed in QEMU and completes successfully.
Adding binary size regression tests. The binary size of a program, compiled with opt-level=s, is
tracked over time (per PR) and the CI fails if the binary size of the program regresses.

In their next meeting they'll discuss which of these goals are must have for edition and which ones
are stretch goals, and a potential timeline.

luser · 2018-07-03T12:00:11Z

Adding binary size regression tests. The binary size of a program, compiled with opt-level=s, is
tracked over time (per PR) and the CI fails if the binary size of the program regresses.

FYI, we wrote a little Rust tool for use in Firefox CI for tracking binary size more precisely by section size, you might find it useful: https://github.com/luser/rust-size .

therealprof · 2018-07-03T12:07:43Z

@luser I find cargo bloat a bit more useful for that because it points in which function the change was.

pftbest · 2018-07-03T12:12:26Z

@therealprof I think this tools have a different use case. cargo-bloat output is for human consumption, but this tool is for automated scripts (I don't think scripts care about which exact function changed).

therealprof · 2018-07-03T12:20:13Z

@pftbest If you want to flag regressions, it's typically very useful to point out where they happened. I'm using cargo-bloat manually to track regressions in the code generation/libcore over different rustc versions for my MCU crates, e.g. https://github.com/therealprof/microbit/blob/master/tools/capture_example_bloat.sh

Now with a little bit of hacking and the storage of the previous successful build result this could be automated and even produce the assembly output of the previous vs. regressed build for manual inspection... That's what I would expect to see but of course YMMV.

japaric · 2018-07-27T22:09:04Z

Answering #129 (comment) here: cc @jamesmunns

We should definitively have a link test of a cortex-m-rt program. The linker script used by cortex-m-rt has assertions that check the validity of the memory layout of the program. This reduces the need for inspecting the produced binary.

We should also test a few variations of the cortex-m-rt program. For example linking the program to libm.a should not produce "duplicate symbol" errors. We should test that both linking to a #[panic_implementation] provider crate, and defining the #[panic_implementation] in the top / leaf crate itself both work work. And ... I can't think of any variation at the moment :-).

Lately, I've been exploring running Cortex-M programs in QEMU using two different approaches -- I have shared my findings in #47 (comment). The IRR folks may be interested in continuing to explore QEMU for testing Cortex-M programs.

jamesmunns · 2018-07-27T23:26:45Z

CC @nerdyvaishali and @sekineh. The comment from @japaric, as well as the discussion from the last month or so may be interesting for you.

These would correlate with our tracking issue jamesmunns/irr-embedded-2018#3.

japaric · 2018-08-07T14:32:08Z

@jamesmunns we haven't had time to check on this during the meetings. Any news on this front?

Also, @pftbest mentioned that one can use semihosting from within QEMU to interact with the host (use stdout, open / read / write files) in #47 (comment). Haven't tried myself but sounds like it could be used to write some tests.

jamesmunns · 2018-08-12T08:06:13Z

@japaric at the moment @sekineh has rust-lang/rust#53190 open, which adds compilation of the cortex-m crate on the four major thumb targets. We are hoping to get that merged this week, and will begin looking at the linking suggestions you made in #52 (comment).

japaric · 2018-11-20T15:21:51Z

The last piece landed in rust-lang/rust#53996

japaric added the upstream label Feb 23, 2018

japaric self-assigned this Feb 23, 2018

japaric added this to the 2018 edition milestone Apr 3, 2018

japaric mentioned this issue Apr 3, 2018

Embedded development on stable #42

Closed

6 tasks

japaric added the Blocks Rust 2018 label Jul 16, 2018

japaric modified the milestones: 2018 edition, RC Jul 17, 2018

v-thakkar mentioned this issue Aug 12, 2018

Add 'it links' tests jamesmunns/irr-embedded-2018#3

Open

japaric mentioned this issue Nov 8, 2018

This year in embedded Rust rust-embedded/blog#25

Merged

7 tasks

japaric closed this as completed Nov 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test embedded targets in rust-lang/rust CI #52

Test embedded targets in rust-lang/rust CI #52

japaric commented Feb 23, 2018 •

edited

Loading

jamesmunns commented Feb 27, 2018

ryankurte commented Feb 27, 2018

jamesmunns commented Feb 28, 2018

jamesmunns commented Feb 28, 2018

dvc94ch commented Feb 28, 2018 via email

jcsoo commented Feb 28, 2018

ryankurte commented Mar 1, 2018 •

edited

Loading

luser commented Apr 24, 2018

Callek commented Apr 24, 2018

japaric commented Jul 3, 2018

luser commented Jul 3, 2018

therealprof commented Jul 3, 2018

pftbest commented Jul 3, 2018

therealprof commented Jul 3, 2018

japaric commented Jul 27, 2018

jamesmunns commented Jul 27, 2018

japaric commented Aug 7, 2018

jamesmunns commented Aug 12, 2018

japaric commented Nov 20, 2018

Test embedded targets in rust-lang/rust CI #52

Test embedded targets in rust-lang/rust CI #52

Comments

japaric commented Feb 23, 2018 • edited Loading

Triage

2018-07-02

TODO

jamesmunns commented Feb 27, 2018

CI Build

Effort to develop

Effort to Maintain

Value

Verdict

Non-Host, Host, and Simulated Host Testing

Hardware In the Loop Testing

Effort to Develop

Effort to Maintain

Value

ryankurte commented Feb 27, 2018

jamesmunns commented Feb 28, 2018

jamesmunns commented Feb 28, 2018

dvc94ch commented Feb 28, 2018 via email

jcsoo commented Feb 28, 2018

ryankurte commented Mar 1, 2018 • edited Loading

luser commented Apr 24, 2018

Callek commented Apr 24, 2018

japaric commented Jul 3, 2018

luser commented Jul 3, 2018

therealprof commented Jul 3, 2018

pftbest commented Jul 3, 2018

therealprof commented Jul 3, 2018

japaric commented Jul 27, 2018

jamesmunns commented Jul 27, 2018

japaric commented Aug 7, 2018

jamesmunns commented Aug 12, 2018

japaric commented Nov 20, 2018

japaric commented Feb 23, 2018 •

edited

Loading

ryankurte commented Mar 1, 2018 •

edited

Loading