Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine-readable output for cargo nextest run #20

Open
sunshowers opened this issue Feb 10, 2022 · 14 comments
Open

Machine-readable output for cargo nextest run #20

sunshowers opened this issue Feb 10, 2022 · 14 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@sunshowers
Copy link
Member

The output should be similar to what cargo itself provides, complete with --message-format json and --message-format json-render-status options.

@sunshowers
Copy link
Member Author

sunshowers commented Jul 31, 2023

This is something that we'd still like to do but it's just the work that needs to be done.

Help wanted: If you're interested in tackling this project, here's the general set of things we need to do:

  1. Define a format with data structures. The format should carefully be designed to allow future extensibility (e.g. enums should be open universes for forward compatibility).
  2. Add that format to nextest-metadata.
  3. Make nextest-runner produce output in that format.
  4. Carefully work through backwards compatibility concerns: if we need to make a change in the future, how would it be done?
  5. Add tests for the format: probably can migrate the nextest integration tests to the new format.
  6. Add tests for forward compatibility as well -- need to make sure that future additions don't break the format.
  7. Write documentation for the format.
  8. Stretch goal: Make nextest-metadata able to execute cargo nextest run with that format.

I'd estimate this to be around 80 hours of work for a newcomer to the project -- it's a chunky, nontrivial project but totally doable for someone new and interested.

@Jake-Shadle
Copy link
Contributor

We're interested in doing this work so that we can feed structured test results to this. (Note that while it supports JUnit which nextest can already produce that option has limitations that make it less appealing).

Before I started work on this, I wanted to run the idea I have for the initial implementation by you first before I do work that then gets rejected.

Basically the idea would be to have a --message-format libtest-json --format-version <n> where n is a serial incrementing number (the same --format-version <n> option already exists for cargo metadata) that follows the current structured output of nightly so that users can specify a particular revision of the format for stability, even in the face of breaking changes. This would allow tools that understand the (current) nightly output to be integrated with nextest, similar to the rationale for supporting JUnit.

This adds a maintenance burden as breaking changes to the nightly output would need to be reflected in a new format-version in nextest, but I haven't actually looked at how often the format has changed historically to know if this is a major concern or not. This is mitigated by the ability to pin to a specific version so IMO it's not a huge deal, especially if the code is done in the right way, and thus hopefully fairly easy for contributors who need the latest and greatest format output to add support for them without too much trouble.

This would also allow nextest to define its own format separately that would be unaffected by libtest decisions.

@sunshowers
Copy link
Member Author

We're interested in doing this work so that we can feed structured test results to this. (Note that while it supports JUnit which nextest can already produce that option has limitations that make it less appealing).

Yay, super excited about this!

Could you document the limitations of JUnit? It wasn't entirely clear to me what they were from reading the documentation.

For reference, buildkite uses this test collector.

Basically the idea would be to have a --message-format libtest-json --format-version where n is a serial incrementing number (the same --format-version option already exists for cargo metadata).

The idea of supporting libtest JSON is definitely interesting. Is this an accurate read of the format?

  • My immediate concern is that it doesn't look like there's any space for the name of the test binary (what gets reflected in the testsuite in JUnit), which I think is important. Looking at this definition, buildkite doesn't appear to be able to take in the name of the binary either.
  • There doesn't appear to be support for retries, timeouts, slow tests or other similar events that are a part of nextest but not of cargo test.

So what this suggests is that there have to be changes made to the buildkite collector, and honestly to anything else that consumes libtest JSON events, to support nextest.

But at that point why don't we just define our own format?

  • We can draw inspiration from the libtest format, but there are so many additional features that nextest supports that aren't representable in the libtest format.
  • It also seems like writing a buildkite test collector isn't very difficult -- maybe there should just be a separate buildkite test collector for nextest.

Some other thoughts if we go down the libtest-json emulation path:

  • Users may want to mix Cargo JSON output with nextest JSON output, so we should have a reason field with the value nextest-libtest-json attached to each message.
  • We need to also be able to say "output messages to stdout while rendering human-readable messages to stderr". With cargo that's done with --message-format json-render-diagnostics, and we'll want to support something similar.

--format-version sounds good, though I'd prefer something like 0.1 to indicate that this is not yet a stable format.

@sunshowers
Copy link
Member Author

I think before we decide on the right path here, I'd like to see a more fleshed out proposal with how to handle retries and timeouts. I'd like to understand the limits of what can be supported within the libtest format.

@Jake-Shadle
Copy link
Contributor

From here:

While most test frameworks have a built-in JUnit XML export feature, these JUnit reports do not provide detailed span information. Therefore, features in Test Analytics that depend on span information aren't available when using JUnit as a data source. If you need span information, consider using the JSON import API instead.

The idea of supporting libtest JSON is definitely interesting. Is rust-lang/rust#49359 (comment) an accurate read of the format?

Yes, this is the output from rustc 1.75.0-nightly (cd674d617 2023-10-24) for a portion of our tests, which, at least according to the linked comment is unchanged since 2019.

     Running unittests src/lib.rs (target/debug/deps/ark_module_publish-3a31a00a8f43e692)
{ "type": "suite", "event": "started", "test_count": 0 }
{ "type": "suite", "event": "ok", "passed": 0, "failed": 0, "ignored": 0, "measured": 0, "filtered_out": 0, "exec_time": 0.000055475 }
     Running tests/publish.rs (target/debug/deps/publish-f5e75470c143959e)
{ "type": "suite", "event": "started", "test_count": 1 }
{ "type": "test", "event": "started", "name": "test_publish" }
{ "type": "test", "name": "test_publish", "event": "ok" }
{ "type": "suite", "event": "ok", "passed": 1, "failed": 0, "ignored": 0, "measured": 0, "filtered_out": 0, "exec_time": 0.000870104 }
     Running tests/transforms.rs (target/debug/deps/transforms-4566ecd1e222c87c)
{ "type": "suite", "event": "started", "test_count": 3 }
{ "type": "test", "event": "started", "name": "adds_sections" }
{ "type": "test", "event": "started", "name": "reads_rustc" }
{ "type": "test", "event": "started", "name": "strips" }
{ "type": "test", "name": "strips", "event": "failed", "stdout": "thread 'strips' panicked at components/module-publish/tests/transforms.rs:70:67:\nfailed to build module: \"the wasm32-unknown-unknown target is not installed, unable to build wasm module 'test-panic-applet'\"\nstack backtrace:\n   0: rust_begin_unwind\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/std/src/panicking.rs:597:5\n   1: core::panicking::panic_fmt\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/panicking.rs:72:14\n   2: core::result::unwrap_failed\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1653:5\n   3: core::result::Result<T,E>::expect\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1034:23\n   4: transforms::strips\n             at ./tests/transforms.rs:70:20\n   5: transforms::strips::{{closure}}\n             at ./tests/transforms.rs:64:12\n   6: core::ops::function::FnOnce::call_once\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\n   7: core::ops::function::FnOnce::call_once\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\nnote: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.\n" }
{ "type": "test", "name": "adds_sections", "event": "failed", "stdout": "thread 'adds_sections' panicked at components/module-publish/tests/transforms.rs:23:67:\nfailed to build module: \"the wasm32-unknown-unknown target is not installed, unable to build wasm module 'test-panic-applet'\"\nstack backtrace:\n   0: rust_begin_unwind\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/std/src/panicking.rs:597:5\n   1: core::panicking::panic_fmt\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/panicking.rs:72:14\n   2: core::result::unwrap_failed\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1653:5\n   3: core::result::Result<T,E>::expect\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1034:23\n   4: transforms::adds_sections\n             at ./tests/transforms.rs:23:20\n   5: transforms::adds_sections::{{closure}}\n             at ./tests/transforms.rs:22:19\n   6: core::ops::function::FnOnce::call_once\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\n   7: core::ops::function::FnOnce::call_once\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\nnote: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.\n" }
{ "type": "test", "name": "reads_rustc", "event": "failed", "stdout": "thread 'reads_rustc' panicked at components/module-publish/tests/transforms.rs:124:67:\nfailed to build module: \"the wasm32-unknown-unknown target is not installed, unable to build wasm module 'test-panic-applet'\"\nstack backtrace:\n   0: rust_begin_unwind\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/std/src/panicking.rs:597:5\n   1: core::panicking::panic_fmt\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/panicking.rs:72:14\n   2: core::result::unwrap_failed\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1653:5\n   3: core::result::Result<T,E>::expect\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/result.rs:1034:23\n   4: transforms::reads_rustc\n             at ./tests/transforms.rs:124:20\n   5: transforms::reads_rustc::{{closure}}\n             at ./tests/transforms.rs:118:17\n   6: core::ops::function::FnOnce::call_once\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\n   7: core::ops::function::FnOnce::call_once\n             at /rustc/cd674d61790607dfb6faa9d754bd3adfa13aea7c/library/core/src/ops/function.rs:250:5\nnote: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.\n" }
{ "type": "suite", "event": "failed", "passed": 0, "failed": 3, "ignored": 0, "measured": 0, "filtered_out": 0, "exec_time": 2.413460761 }

My immediate concern is that it doesn't look like there's any space for the name of the test binary (what gets reflected in the testsuite in JUnit), which I think is important. Looking at this definition, buildkite doesn't appear to be able to take in the name of the binary either.

You are correct, my guess is the libtest output is more focused on a single crate use case but obviously loses relevant information inside a large workspace with hundreds or thousands of tests spread across multiple crates with n test binaries. That being said, I think the libtest output can still be replicated, just that the name field of the event could include the crate + test binary name as a prefix to the name, that output consumers can then parse out. But I definitely think this is one place where nextest could definitely improve on for its own output format separately (and maybe even inform libtest itself).

There doesn't appear to be support for retries, timeouts, slow tests or other similar events that are a part of nextest but not of cargo test.

Also correct, I think this could be fine by adding a "nextest" subobject in the event for this kind of additional information outside of the scope of libtest so that consumers can either ignore it if they don't understand it, or take advantage of it they do. That being said, in the retries case that might need to be handled explicitly by consumers if they have logic that will fail if it encounters the same test multiple times (though I suppose if nextest already does mutations on the test name by adding the crate name/test binary as stated above then having something like #<n> as a suffix or whatnot to ensure uniqueness would be "fine").

But at that point why don't we just define our own format?

  • We can draw inspiration from the libtest format, but there are so many additional features that nextest supports that aren't representable in the libtest format.
  • It also seems like writing a buildkite test collector isn't very difficult -- maybe there should just be a separate buildkite test collector for nextest.

My reason for suggesting having libtest compatible output was more so that projects that use other tooling that used to (or still does) depend on that format, as it was usable in stable rust for years before 1.70.0 made it unstable, can seamlessly (hopefully) drop in cargo-nextest without too much disruption, but I don't know how many people would actually find that useful.

We will write our own buildkite test collector regardless, its support for the libtest output was just what made me think of other tooling in the wild that might be able to get that seamless interop with cargo-nextest if it supported the same output.

I think before we decide on the right path here, I'd like to see a more fleshed out proposal with how to handle retries and timeouts. I'd like to understand the limits of what can be supported within the libtest format.

I think the easiest thing for me to understand all of this would be just to do a quick and dirty implementation (+ tests) in both nextest as well as a consumer to actually have a concrete thing to talk about, as I find that much easier, if that's ok?

@sunshowers
Copy link
Member Author

From here:

Hmm, I might be misunderstanding but how would you get the span information out of a Rust test? It seems like to do that you'd have to have tests produce the span information in a way that nextest can understand. This is certainly possible to do in principle but the protocol for it doesn't exist yet.

That being said, I think the libtest output can still be replicated, just that the name field of the event could include the crate + test binary name as a prefix to the name

I think this is fine, we could use a separator like $ to split up the binary and test names.

Also correct, I think this could be fine by adding a "nextest" subobject in the event for this kind of additional information outside of the scope of libtest so that consumers can either ignore it if they don't understand it, or take advantage of it they do.

I like having the nextest subobject with more information attached to each object. Retries are a little tricky but if we're doing libtest emulation, using #n is fine.

We will write our own buildkite test collector regardless, its support for the libtest output was just what made me think of other tooling in the wild that might be able to get that seamless interop with cargo-nextest if it supported the same output.

I'm curious how buildkite supports retries. They have a section on flaky tests but I didn't see anything in the JSON output which indicated support for that.

I think the easiest thing for me to understand all of this would be just to do a quick and dirty implementation (+ tests) in both nextest as well as a consumer to actually have a concrete thing to talk about, as I find that much easier, if that's ok?

Sure, that sounds good.

@Jake-Shadle
Copy link
Contributor

So I've got this mostly done, but I've hit a rather fundamental issue with how nextest-runner captures output, namely that it is split into 2 separate buffers for stdout and stderr. This makes it impossible to properly write the captured output to the json object in the same way that libtest does.

Concretely, lets say we have this test.

#[test]
fn fails() {
    println!("this is stdout");
    eprintln!("this is stderr");

    panic!("fiddlesticks");
}

The libtest output with cargo +nightly test --no-fail-fast -- -Z unstable-options --format json --report-time --ensure-time for the test is

{ "type": "test", "name": "fails", "event": "failed", "exec_time": 0.009125365, "stdout": "this is stdout\nthis is stderr\nthread 'fails' panicked at src/lib.rs:34:5:\nfiddlesticks\nstack backtrace:\n   0: rust_begin_unwind\n             at /rustc/aa1a71e9e90f6eb3aed8cf79fc80bea304c17ecb/library/std/src/panicking.rs:597:5\n   1: core::panicking::panic_fmt\n             at /rustc/aa1a71e9e90f6eb3aed8cf79fc80bea304c17ecb/library/core/src/panicking.rs:72:14\n   2: myplayground::fails\n             at ./src/lib.rs:34:5\n   3: myplayground::fails::{{closure}}\n             at ./src/lib.rs:30:11\n   4: core::ops::function::FnOnce::call_once\n             at /rustc/aa1a71e9e90f6eb3aed8cf79fc80bea304c17ecb/library/core/src/ops/function.rs:250:5\n   5: core::ops::function::FnOnce::call_once\n             at /rustc/aa1a71e9e90f6eb3aed8cf79fc80bea304c17ecb/library/core/src/ops/function.rs:250:5\nnote: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.\n" }

The stdout field is a misnomer, as really it is all captured output, from both stdout and stderr, and crucially, it is the output as if it was sent to a terminal, ie, in the order it was written.

I think the proper way to fix this would be to have a capture structure in nextest-runner that keeps a single buffer containing the combination of both stdout and stderr, and then an array that contains the ranges of bytes for each output block, the stream it came from, and the timestamp (or duration from start of capture), so that the output can be captured faithfully for this case (and others?), and then simple helpers that can reconstruct the individual streams as they are currently used in nextest, which shouldn't be much of a problem since the streams are currently just byte buffers, and the cases where they are interpreted as utf-8 strings already take the hit of (potentially) allocating by using the _lossy forms of byte buffer -> utf-8 strings. Having the timing information for each individual block (usually but not always a single line) would also make implementing a future message-format that includes such timing information as structured data quite a bit easier.

I can submit what I have now in a PR first, or I could do the described work in a separate PR first? Let me know how you want me to proceed.

@NobodyXu
Copy link
Contributor

Perhaps you can use a pipe and then set stdout and stderr to both write into that pipe instead, when json output is required?

That will be much simpler.

@Jake-Shadle
Copy link
Contributor

I've opened #1086 as a draft, will create a helper for doing combined output in the same branch but can split it out into a separate PR if needed later.

@Jake-Shadle
Copy link
Contributor

I've opened #1088 which adds support for interleaved test output that can have each stream recovered separately to satisfy all of the current code.

@sunshowers
Copy link
Member Author

sunshowers commented Dec 10, 2023

Thanks Jake! Experimental support for this will be part of nextest 0.9.65, which I'll release tomorrow (in around 18 hours). https://nexte.st/book/run-machine-readable.html has some basic documentation, and the remaining issues are tracked in #1152 (help getting them done would be greatly appreciated).

@sunshowers
Copy link
Member Author

nextest 0.9.66 is now out (0.9.65 had a build issue) and it includes basic support. However, it is missing a few important bits like #1088.

@torokati44
Copy link

Apparently the last mentioned PR has landed as #1212. So, is this still incomplete? 🤔

@sunshowers
Copy link
Member Author

sunshowers commented May 27, 2024

@torokati44 Yes, because the libtest output isn't stable yet (see #1152). Besides, the libtest JSON format has some limitations, and we'll probably want to support at least one nextest-native format as well.

Nextest is mostly a spare-time project for me these days, and I only add new features to it based on my employer's needs. If you'd like to see this happen, please get involved! Contributions would be really welcome. #1152 has a few tasks that are good first issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants