Machine-readable output for cargo nextest run #20

sunshowers · 2022-02-10T01:46:51Z

The output should be similar to what cargo itself provides, complete with --message-format json and --message-format json-render-status options.

The text was updated successfully, but these errors were encountered:

sunshowers · 2023-07-31T21:25:46Z

This is something that we'd still like to do but it's just the work that needs to be done.

Help wanted: If you're interested in tackling this project, here's the general set of things we need to do:

Define a format with data structures. The format should carefully be designed to allow future extensibility (e.g. enums should be open universes for forward compatibility).
Add that format to nextest-metadata.
Make nextest-runner produce output in that format.
Carefully work through backwards compatibility concerns: if we need to make a change in the future, how would it be done?
Add tests for the format: probably can migrate the nextest integration tests to the new format.
Add tests for forward compatibility as well -- need to make sure that future additions don't break the format.
Write documentation for the format.
Stretch goal: Make nextest-metadata able to execute cargo nextest run with that format.

I'd estimate this to be around 80 hours of work for a newcomer to the project -- it's a chunky, nontrivial project but totally doable for someone new and interested.

Jake-Shadle · 2023-10-25T13:17:42Z

We're interested in doing this work so that we can feed structured test results to this. (Note that while it supports JUnit which nextest can already produce that option has limitations that make it less appealing).

Before I started work on this, I wanted to run the idea I have for the initial implementation by you first before I do work that then gets rejected.

Basically the idea would be to have a --message-format libtest-json --format-version <n> where n is a serial incrementing number (the same --format-version <n> option already exists for cargo metadata) that follows the current structured output of nightly so that users can specify a particular revision of the format for stability, even in the face of breaking changes. This would allow tools that understand the (current) nightly output to be integrated with nextest, similar to the rationale for supporting JUnit.

This adds a maintenance burden as breaking changes to the nightly output would need to be reflected in a new format-version in nextest, but I haven't actually looked at how often the format has changed historically to know if this is a major concern or not. This is mitigated by the ability to pin to a specific version so IMO it's not a huge deal, especially if the code is done in the right way, and thus hopefully fairly easy for contributors who need the latest and greatest format output to add support for them without too much trouble.

This would also allow nextest to define its own format separately that would be unaffected by libtest decisions.

sunshowers · 2023-10-26T00:15:55Z

We're interested in doing this work so that we can feed structured test results to this. (Note that while it supports JUnit which nextest can already produce that option has limitations that make it less appealing).

Yay, super excited about this!

Could you document the limitations of JUnit? It wasn't entirely clear to me what they were from reading the documentation.

For reference, buildkite uses this test collector.

Basically the idea would be to have a --message-format libtest-json --format-version where n is a serial incrementing number (the same --format-version option already exists for cargo metadata).

The idea of supporting libtest JSON is definitely interesting. Is this an accurate read of the format?

My immediate concern is that it doesn't look like there's any space for the name of the test binary (what gets reflected in the testsuite in JUnit), which I think is important. Looking at this definition, buildkite doesn't appear to be able to take in the name of the binary either.
There doesn't appear to be support for retries, timeouts, slow tests or other similar events that are a part of nextest but not of cargo test.

So what this suggests is that there have to be changes made to the buildkite collector, and honestly to anything else that consumes libtest JSON events, to support nextest.

But at that point why don't we just define our own format?

We can draw inspiration from the libtest format, but there are so many additional features that nextest supports that aren't representable in the libtest format.
It also seems like writing a buildkite test collector isn't very difficult -- maybe there should just be a separate buildkite test collector for nextest.

Some other thoughts if we go down the libtest-json emulation path:

Users may want to mix Cargo JSON output with nextest JSON output, so we should have a reason field with the value nextest-libtest-json attached to each message.
We need to also be able to say "output messages to stdout while rendering human-readable messages to stderr". With cargo that's done with --message-format json-render-diagnostics, and we'll want to support something similar.

--format-version sounds good, though I'd prefer something like 0.1 to indicate that this is not yet a stable format.

sunshowers · 2023-10-26T00:19:08Z

I think before we decide on the right path here, I'd like to see a more fleshed out proposal with how to handle retries and timeouts. I'd like to understand the limits of what can be supported within the libtest format.

Jake-Shadle · 2023-10-26T12:25:37Z

From here:

While most test frameworks have a built-in JUnit XML export feature, these JUnit reports do not provide detailed span information. Therefore, features in Test Analytics that depend on span information aren't available when using JUnit as a data source. If you need span information, consider using the JSON import API instead.

The idea of supporting libtest JSON is definitely interesting. Is rust-lang/rust#49359 (comment) an accurate read of the format?

Yes, this is the output from rustc 1.75.0-nightly (cd674d617 2023-10-24) for a portion of our tests, which, at least according to the linked comment is unchanged since 2019.

     Running unittests src/lib.rs (target/debug/deps/ark_module_publish-3a31a00a8f43e692)
{ "type": "suite", "event": "started", "test_count": 0 }
{ "type": "suite", "event": "ok", "passed": 0, "failed": 0, "ignored": 0, "measured": 0, "filtered_out": 0, "exec_time": 0.000055475 }
     Running tests/publish.rs (target/debug/deps/publish-f5e75470c143959e)
{ "type": "suite", "event": "started", "test_count": 1 }
{ "type": "test", "event": "started", "name": "test_publish" }
{ "type": "test", "name": "test_publish", "event": "ok" }
{ "type": "suite", "event": "ok", "passed": 1, "failed": 0, "ignored": 0, "measured": 0, "filtered_out": 0, "exec_time": 0.000870104 }
     Running tests/transforms.rs (target/debug/deps/transforms-4566ecd1e222c87c)
{ "type": "suite", "event": "started", "test_count": 3 }
{ "type": "test", "event": "started", "name": "adds_sections" }
{ "type": "test", "event": "started", "name": "reads_rustc" }
{ "type": "test", "event": "started", "name": "strips" }
{ "type": "test", "name": "strips", "event": "failed", "stdout": "thread 'strips' panicked at components/module-publish/tests/transforms.rs:70:67:\nfailed to build module: \"the wasm32-unknown-unknown target is not installed, unable to build wasm module 'test-panic-applet'\"\nstack backtrace:\n   0: rust_begin_unwind\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/std/src/panicking.rs:597:5\n   1: core::panicking::panic_fmt\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/panicking.rs:72:14\n   2: core::result::unwrap_failed\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1653:5\n   3: core::result::Result<T,E>::expect\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1034:23\n   4: transforms::strips\n             at ./tests/transforms.rs:70:20\n   5: transforms::strips::{{closure}}\n             at ./tests/transforms.rs:64:12\n   6: core::ops::function::FnOnce::call_once\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\n   7: core::ops::function::FnOnce::call_once\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\nnote: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.\n" }
{ "type": "test", "name": "adds_sections", "event": "failed", "stdout": "thread 'adds_sections' panicked at components/module-publish/tests/transforms.rs:23:67:\nfailed to build module: \"the wasm32-unknown-unknown target is not installed, unable to build wasm module 'test-panic-applet'\"\nstack backtrace:\n   0: rust_begin_unwind\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/std/src/panicking.rs:597:5\n   1: core::panicking::panic_fmt\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/panicking.rs:72:14\n   2: core::result::unwrap_failed\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1653:5\n   3: core::result::Result<T,E>::expect\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1034:23\n   4: transforms::adds_sections\n             at ./tests/transforms.rs:23:20\n   5: transforms::adds_sections::{{closure}}\n             at ./tests/transforms.rs:22:19\n   6: core::ops::function::FnOnce::call_once\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\n   7: core::ops::function::FnOnce::call_once\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\nnote: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.\n" }
{ "type": "test", "name": "reads_rustc", "event": "failed", "stdout": "thread 'reads_rustc' panicked at components/module-publish/tests/transforms.rs:124:67:\nfailed to build module: \"the wasm32-unknown-unknown target is not installed, unable to build wasm module 'test-panic-applet'\"\nstack backtrace:\n   0: rust_begin_unwind\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/std/src/panicking.rs:597:5\n   1: core::panicking::panic_fmt\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/panicking.rs:72:14\n   2: core::result::unwrap_failed\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1653:5\n   3: core::result::Result<T,E>::expect\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1034:23\n   4: transforms::reads_rustc\n             at ./tests/transforms.rs:124:20\n   5: transforms::reads_rustc::{{closure}}\n             at ./tests/transforms.rs:118:17\n   6: core::ops::function::FnOnce::call_once\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\n   7: core::ops::function::FnOnce::call_once\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\nnote: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.\n" }
{ "type": "suite", "event": "failed", "passed": 0, "failed": 3, "ignored": 0, "measured": 0, "filtered_out": 0, "exec_time": 2.413460761 }

My immediate concern is that it doesn't look like there's any space for the name of the test binary (what gets reflected in the testsuite in JUnit), which I think is important. Looking at this definition, buildkite doesn't appear to be able to take in the name of the binary either.

You are correct, my guess is the libtest output is more focused on a single crate use case but obviously loses relevant information inside a large workspace with hundreds or thousands of tests spread across multiple crates with n test binaries. That being said, I think the libtest output can still be replicated, just that the name field of the event could include the crate + test binary name as a prefix to the name, that output consumers can then parse out. But I definitely think this is one place where nextest could definitely improve on for its own output format separately (and maybe even inform libtest itself).

There doesn't appear to be support for retries, timeouts, slow tests or other similar events that are a part of nextest but not of cargo test.

Also correct, I think this could be fine by adding a "nextest" subobject in the event for this kind of additional information outside of the scope of libtest so that consumers can either ignore it if they don't understand it, or take advantage of it they do. That being said, in the retries case that might need to be handled explicitly by consumers if they have logic that will fail if it encounters the same test multiple times (though I suppose if nextest already does mutations on the test name by adding the crate name/test binary as stated above then having something like #<n> as a suffix or whatnot to ensure uniqueness would be "fine").

But at that point why don't we just define our own format?

We can draw inspiration from the libtest format, but there are so many additional features that nextest supports that aren't representable in the libtest format.

It also seems like writing a buildkite test collector isn't very difficult -- maybe there should just be a separate buildkite test collector for nextest.

My reason for suggesting having libtest compatible output was more so that projects that use other tooling that used to (or still does) depend on that format, as it was usable in stable rust for years before 1.70.0 made it unstable, can seamlessly (hopefully) drop in cargo-nextest without too much disruption, but I don't know how many people would actually find that useful.

We will write our own buildkite test collector regardless, its support for the libtest output was just what made me think of other tooling in the wild that might be able to get that seamless interop with cargo-nextest if it supported the same output.

I think before we decide on the right path here, I'd like to see a more fleshed out proposal with how to handle retries and timeouts. I'd like to understand the limits of what can be supported within the libtest format.

I think the easiest thing for me to understand all of this would be just to do a quick and dirty implementation (+ tests) in both nextest as well as a consumer to actually have a concrete thing to talk about, as I find that much easier, if that's ok?

sunshowers · 2023-10-26T20:20:20Z

From here:

Hmm, I might be misunderstanding but how would you get the span information out of a Rust test? It seems like to do that you'd have to have tests produce the span information in a way that nextest can understand. This is certainly possible to do in principle but the protocol for it doesn't exist yet.

That being said, I think the libtest output can still be replicated, just that the name field of the event could include the crate + test binary name as a prefix to the name

I think this is fine, we could use a separator like $ to split up the binary and test names.

Also correct, I think this could be fine by adding a "nextest" subobject in the event for this kind of additional information outside of the scope of libtest so that consumers can either ignore it if they don't understand it, or take advantage of it they do.

I like having the nextest subobject with more information attached to each object. Retries are a little tricky but if we're doing libtest emulation, using #n is fine.

We will write our own buildkite test collector regardless, its support for the libtest output was just what made me think of other tooling in the wild that might be able to get that seamless interop with cargo-nextest if it supported the same output.

I'm curious how buildkite supports retries. They have a section on flaky tests but I didn't see anything in the JSON output which indicated support for that.

I think the easiest thing for me to understand all of this would be just to do a quick and dirty implementation (+ tests) in both nextest as well as a consumer to actually have a concrete thing to talk about, as I find that much easier, if that's ok?

Sure, that sounds good.

Jake-Shadle · 2023-10-31T12:54:11Z

So I've got this mostly done, but I've hit a rather fundamental issue with how nextest-runner captures output, namely that it is split into 2 separate buffers for stdout and stderr. This makes it impossible to properly write the captured output to the json object in the same way that libtest does.

Concretely, lets say we have this test.

#[test]
fn fails() {
    println!("this is stdout");
    eprintln!("this is stderr");

    panic!("fiddlesticks");
}

The libtest output with cargo +nightly test --no-fail-fast -- -Z unstable-options --format json --report-time --ensure-time for the test is

{ "type": "test", "name": "fails", "event": "failed", "exec_time": 0.009125365, "stdout": "this is stdout\nthis is stderr\nthread 'fails' panicked at src/lib.rs:34:5:\nfiddlesticks\nstack backtrace:\n   0: rust_begin_unwind\n             at /rustc/aa1a71e9e90f6eb3aed8cf79fc80bea304c17ecb/library/std/src/panicking.rs:597:5\n   1: core::panicking::panic_fmt\n             at /rustc/aa1a71e9e90f6eb3aed8cf79fc80bea304c17ecb/library/core/src/panicking.rs:72:14\n   2: myplayground::fails\n             at ./src/lib.rs:34:5\n   3: myplayground::fails::{{closure}}\n             at ./src/lib.rs:30:11\n   4: core::ops::function::FnOnce::call_once\n             at /rustc/aa1a71e9e90f6eb3aed8cf79fc80bea304c17ecb/library/core/src/ops/function.rs:250:5\n   5: core::ops::function::FnOnce::call_once\n             at /rustc/aa1a71e9e90f6eb3aed8cf79fc80bea304c17ecb/library/core/src/ops/function.rs:250:5\nnote: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.\n" }

The stdout field is a misnomer, as really it is all captured output, from both stdout and stderr, and crucially, it is the output as if it was sent to a terminal, ie, in the order it was written.

I think the proper way to fix this would be to have a capture structure in nextest-runner that keeps a single buffer containing the combination of both stdout and stderr, and then an array that contains the ranges of bytes for each output block, the stream it came from, and the timestamp (or duration from start of capture), so that the output can be captured faithfully for this case (and others?), and then simple helpers that can reconstruct the individual streams as they are currently used in nextest, which shouldn't be much of a problem since the streams are currently just byte buffers, and the cases where they are interpreted as utf-8 strings already take the hit of (potentially) allocating by using the _lossy forms of byte buffer -> utf-8 strings. Having the timing information for each individual block (usually but not always a single line) would also make implementing a future message-format that includes such timing information as structured data quite a bit easier.

I can submit what I have now in a PR first, or I could do the described work in a separate PR first? Let me know how you want me to proceed.

NobodyXu · 2023-10-31T21:23:05Z

Perhaps you can use a pipe and then set stdout and stderr to both write into that pipe instead, when json output is required?

That will be much simpler.

Jake-Shadle · 2023-11-01T14:15:03Z

I've opened #1086 as a draft, will create a helper for doing combined output in the same branch but can split it out into a separate PR if needed later.

Jake-Shadle · 2023-11-02T16:21:00Z

I've opened #1088 which adds support for interleaved test output that can have each stream recovered separately to satisfy all of the current code.

sunshowers · 2023-12-10T01:36:40Z

Thanks Jake! Experimental support for this will be part of nextest 0.9.65, which I'll release tomorrow (in around 18 hours). https://nexte.st/book/run-machine-readable.html has some basic documentation, and the remaining issues are tracked in #1152 (help getting them done would be greatly appreciated).

sunshowers · 2023-12-11T00:00:38Z

nextest 0.9.66 is now out (0.9.65 had a build issue) and it includes basic support. However, it is missing a few important bits like #1088.

torokati44 · 2024-04-18T15:49:40Z

Apparently the last mentioned PR has landed as #1212. So, is this still incomplete? 🤔

sunshowers · 2024-05-27T02:15:45Z

@torokati44 Yes, because the libtest output isn't stable yet (see #1152). Besides, the libtest JSON format has some limitations, and we'll probably want to support at least one nextest-native format as well.

Nextest is mostly a spare-time project for me these days, and I only add new features to it based on my employer's needs. If you'd like to see this happen, please get involved! Contributions would be really welcome. #1152 has a few tasks that are good first issues.

sunshowers added the enhancement New feature or request label Feb 10, 2022

sunshowers added this to the nextest-m2 milestone Feb 10, 2022

TriplEight mentioned this issue Feb 17, 2022

Use cached cargo-install paritytech/smoldot#2057

Merged

nipunn1313 mentioned this issue Dec 29, 2022

Support for showing the slowest N tests #742

Open

kazk mentioned this issue Jun 6, 2023

JSON output for Rust libtest require nightly since 1.70 codewars/runner#250

Open

senekor mentioned this issue Jul 18, 2023

Consider fixing versions in Dockerfile exercism/rust-test-runner#43

Closed

sunshowers added the help wanted Extra attention is needed label Jul 31, 2023

sunshowers removed this from the nextest-m2 milestone Jul 31, 2023

sunshowers mentioned this issue Oct 26, 2023

Run fast/slow tests first #905

Open

nuke-web3 mentioned this issue Apr 9, 2024

Status of XML Deserialization in quick-junit nextest-rs/quick-junit#9

Open

Jake-Shadle mentioned this issue Nov 8, 2023

Add support for combined test output #1088

Closed

huitseeker mentioned this issue Jan 1, 2024

feat request: Markdown format #1200

Closed

torokati44 mentioned this issue Jul 27, 2024

tests: Mark some tests as slow and schedule them at the beginning ruffle-rs/ruffle#15950

Draft

zjp-CN mentioned this issue Nov 6, 2024

记录测试信息 os-checker/os-checker#148

Open

sunshowers mentioned this issue Dec 21, 2024

No output when upgrading nextest to 0.9.86 Canop/bacon#280

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine-readable output for cargo nextest run #20

Machine-readable output for cargo nextest run #20

sunshowers commented Feb 10, 2022

sunshowers commented Jul 31, 2023 •

edited

Loading

Jake-Shadle commented Oct 25, 2023

sunshowers commented Oct 26, 2023

sunshowers commented Oct 26, 2023

Jake-Shadle commented Oct 26, 2023

sunshowers commented Oct 26, 2023

Jake-Shadle commented Oct 31, 2023

NobodyXu commented Oct 31, 2023

Jake-Shadle commented Nov 1, 2023

Jake-Shadle commented Nov 2, 2023

sunshowers commented Dec 10, 2023 •

edited

Loading

sunshowers commented Dec 11, 2023

torokati44 commented Apr 18, 2024

sunshowers commented May 27, 2024 •

edited

Loading

Machine-readable output for cargo nextest run #20

Machine-readable output for cargo nextest run #20

Comments

sunshowers commented Feb 10, 2022

sunshowers commented Jul 31, 2023 • edited Loading

Jake-Shadle commented Oct 25, 2023

sunshowers commented Oct 26, 2023

sunshowers commented Oct 26, 2023

Jake-Shadle commented Oct 26, 2023

sunshowers commented Oct 26, 2023

Jake-Shadle commented Oct 31, 2023

NobodyXu commented Oct 31, 2023

Jake-Shadle commented Nov 1, 2023

Jake-Shadle commented Nov 2, 2023

sunshowers commented Dec 10, 2023 • edited Loading

sunshowers commented Dec 11, 2023

torokati44 commented Apr 18, 2024

sunshowers commented May 27, 2024 • edited Loading

sunshowers commented Jul 31, 2023 •

edited

Loading

sunshowers commented Dec 10, 2023 •

edited

Loading

sunshowers commented May 27, 2024 •

edited

Loading