Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow bailing early when test fails #52717

Open
nicholaswmin opened this issue Apr 27, 2024 · 16 comments
Open

Allow bailing early when test fails #52717

nicholaswmin opened this issue Apr 27, 2024 · 16 comments
Labels
feature request Issues that request new features to be added to Node.js. test_runner Issues and PRs related to the test runner subsystem.

Comments

@nicholaswmin
Copy link

nicholaswmin commented Apr 27, 2024

Needs a Bail Out/Fail Fast strategy

What is the problem this feature will solve?

Not having to sift through a jumbled up mess of stack traces as I'm cleaning up tests after significant refactoring of the unit i'm testing.

Spit out the failing test and it's stack trace, then Exit 1 as soon as a test fails

Proposal

A bail flag, i.e --bail that hints to the runner that a failure should report and exit.

Alternatives

Cooking up an Abort Signal wiring but that's really me writing machinery code to test.. the code... which is just bad taste, at least in most cases.

@nicholaswmin nicholaswmin added the feature request Issues that request new features to be added to Node.js. label Apr 27, 2024
@avivkeller avivkeller added the test_runner Issues and PRs related to the test runner subsystem. label Apr 27, 2024
@MoLow
Copy link
Member

MoLow commented Apr 28, 2024

this will only work as expected in case of no parallelism/concurrency. otherwise, there is just the option for a best effort where as soon as the orchestrator knows of a failed test it tries its best to stop other processes from executing tests.
a naive approach can be found here: https://www.npmjs.com/package/@reporters/bail

@nicholaswmin
Copy link
Author

nicholaswmin commented Apr 28, 2024

That is exactly right. Parallelisation messes up with a lot of other potential options - sorted tests being another case it doesn't play with.

I do not see parallelism as a first-class concern - I'm much more inclined to get fed up with running tests if I have to mindlessly scroll through pages of stack traces over and over.

I like parallelisation because it forces me to rethink about potential state leaks between tests but that's pretty much it. The test runner is fast enough as it is.

@marco-ippolito
Copy link
Member

My previous (failed) attempt: #48919

@nicholaswmin
Copy link
Author

nicholaswmin commented Apr 29, 2024

Sorry It's not quite clear to me what's going on there.

The spawned process phantoms out? Can I take a stab at it?

@marco-ippolito
Copy link
Member

Sorry It's not quite clear to me what's going on there.

The spawned process phantoms out? Can I take a stab at it?

Go ahead 🙂

@nicholaswmin
Copy link
Author

nicholaswmin commented Apr 29, 2024

@MoLow Why do you consider your approach naive?

So, here:

module.exports = async function* bail(source) {
  for await (const event of source) {
    if (event.type === 'test:fail') {
      /* c8 ignore start */
      yield `\n\u001b[31m✖ Bailing on failed test: ${event.data.name}\u001b[0m\n`;
      throw new Error('Bail');
    }
    /* c8 ignore stop */
  }
};

This line:

throw new Error('Bail');

This won't report the failure, won't it? Is this why you consider this naive?
I also have concerns about phantom child_process spawns;

I remember I had some issues back in 2017 with the child_process module.
The spawned children could become phantoms on their own in a lot of cases.

Does it reliably detect exceptions in the spawns or does it still need special handling in case the IPC signalling get disconnected?

@marco-ippolito
Copy link
Member

@MoLow Why do you consider your approach naive?

So, here:

module.exports = async function* bail(source) {
  for await (const event of source) {
    if (event.type === 'test:fail') {
      /* c8 ignore start */
      yield `\n\u001b[31m✖ Bailing on failed test: ${event.data.name}\u001b[0m\n`;
      throw new Error('Bail');
    }
    /* c8 ignore stop */
  }
};

This line:

throw new Error('Bail');

This won't report the failure, won't it? Is this why you consider this naive? I also have concerns about phantom child_process spawns;

I remember I had some issues back in 2017 with the child_process module. The spawned children could become phantoms on their own in a lot of cases.

Does it reliably detect exceptions in the spawns or does it still need special handling in case the IPC signalling get disconnected?

The issue I see is that test will continue to run, this will only hide the result from the reporter.
Immagine a big test suite failing on the first test, will continue to run for a while

@nicholaswmin
Copy link
Author

nicholaswmin commented Apr 29, 2024

Thanks 🎯 ,

I suppose the main stuff lives in runner/harness.

Just off the top of my head, my plan is this:

  • forward a --bail flag to the spawned children.

  • main

    • Listens on subprocess:test-failed and prints the results
    • Listens on child:error
    • Signals the rest of the bunch via subprocess.kill('SIGTERM') to kill themselves.
  • child

    • Listens on the 'test:fail' event:
    • signals the main via iPC, sending'test-failed'informing of it's failure; plus it's test results as payload.
    • throws an Error and dies as a result of it's exception
  • This ping-pong style of wiring sounds brittle, plus it might interfere with generic error handling on child process errors. A child:error event now has 2 different meanings.

  • Solution A: An exception in the child without a preceding test-failed event should be handled like a regular error and not a test failure. This is starting to sound real dodgy, depending on order of events and introducing additional state in the main.

  • Solution B: The child does not throw an exception on test-failure. Instead it signals the main that it's test has failed, who is responsible for issuing a SIGTERM back to it which exits 0. This misaligns usual runner behavior, AFAIK mocha exits with 1 on test failures. However, this is a child process, not the main. I don't like this either.

I also have big concerns about errant children - I remember working with child_process back in 2017? and having to add timeouts to IPC events in case a child stops responding. Is this still a concernm and if yes, how does current parallel orchrstrator/main handle this? I'd rather not cook up my own homegrown rube-goldbergs.

Last:

  • What about shards? Any special considerations?
  • Do we need to report the success results of the rest of the ongoing runners?

@imme-emosol
Copy link

For reference, seems related: #42990

@imme-emosol
Copy link

For a quick temporary solution (using esm data:), this adaptation of MoLow's https://github.com/MoLow/reporters/blob/main/packages/bail/index.js might be helpful.

node \
--test-reporter='data:text/javascript,
export default async function* bail(source) {
  for await (const event of source) {
    if (event.type === "test:fail") {
      yield `\n\u001b[31m✖ Bailing on failed test: ${event.data.name}\u001b[0m\n`
      ;throw new Error("Bail")
    }
  }
}' \
--test-reporter=spec \
--test-reporter-destination=stdout \
--test-reporter-destination=stdout \
--test

A base64 encoded version is not shorter (but perhaps more usable for your situation).

node \
--test-reporter='data:text/javascript;base64,ZXhwb3J0IGRlZmF1bHQgYXN5bmMgZnVuY3Rpb24qIGJhaWwoc291cmNlKSB7Zm9yIGF3YWl0IChjb25zdCBldmVudCBvZiBzb3VyY2UpIHtpZiAoZXZlbnQudHlwZSA9PT0gInRlc3Q6ZmFpbCIpIHt5aWVsZCBgChtbMzFt4pyWIEJhaWxpbmcgb24gZmFpbGVkIHRlc3Q6ICR7ZXZlbnQuZGF0YS5uYW1lfRtbMG0KYDt0aHJvdyBuZXcgRXJyb3IoIkJhaWwiKX19fQ==' \
--test-reporter=spec \
--test-reporter-destination=stdout \
--test-reporter-destination=stdout \
--test

For completeness, the base64 was generated using the following command.

printf 'export default async function* bail(source) {for await (const event of source) {if (event.type === "test:fail") {yield `\n\u001b[31m✖ Bailing on failed test: ${event.data.name}\u001b[0m\n`;throw new Error("Bail")}}}' |
base64 --wrap 0

@avivkeller
Copy link
Member

Yes, this may be a duplicate of #42990

@nicholaswmin
Copy link
Author

nicholaswmin commented Dec 12, 2024

node \
--test-reporter='data:text/javascript;base64,ZXhwb3J0IGRlZmF1bHQgYXN5bmMgZnVuY3Rpb24qIGJhaWwoc291cmNlKSB7Zm9yIGF3YWl0IChjb25zdCBldmVudCBvZiBzb3VyY2UpIHtpZiAoZXZlbnQudHlwZSA9PT0gInRlc3Q6ZmFpbCIpIHt5aWVsZCBgChtbMzFt4pyWIEJhaWxpbmcgb24gZmFpbGVkIHRlc3Q6ICR7ZXZlbnQuZGF0YS5uYW1lfRtbMG0KYDt0aHJvdyBuZXcgRXJyb3IoIkJhaWwiKX19fQ==' \
--test-reporter=spec \
--test-reporter-destination=stdout \
--test-reporter-destination=stdout \
--test

@imme-emosol

Didn't see this but this is suprisingly ... not bad.

I reluctanty say so because I still do believe that proper fail-fast is needed; its a big stuck up i have with the test runner but this could work in the meantime...

for the record my POV is that it's just really hard to navigate through a failed test suite. The parallalisation effect makes for very messy logs. I say this even though I overwhelmingly agree with the minimalistic approach the test runner has to take.

Do you think it's unnecessary? I didn't quite get your position on this? yes no?

@imme-emosol
Copy link

Up until now, i had not thought much about my position regarding an additional command line argument.
Not in the last because i have only recently been spending some more time in (the run-context of) this nodejs-language.

However, as it seems to be possible to stop testing as-soon-as-possible (ASAP) by providing custom reporters,
it seems to me that: allocating nodejs development efforts elsewhere will probably have more beneficial effect.


In short, the issue seems to be twofold,
one is regarding the interpretation of output,
the other about stopping all running tests when a certain problem occurs.

For the interpretation of test output, the main concern probably is for the output to be standardized so other tooling can format it in a nicholaswmin-/human-readable format, which seems to be possible by providing a custom test-reporter.

Another element of this issue, i guess, is the insight that
the process that runs the tests-sets/suite(s)
does not need to be the same process as
the process that starts the tests-sets/suite(s).


On a longer note, this and other threads give way to an idea that there are hugely varying demands/requirements/desires/contexts about running tests, for which nodejs seems to currently already provide a bare enough minimum to create implementations to meet those contexts.

Being accustomed to a situation in which extra software needed to be installed for testing,
that it is even possible to do testing without such additions is something i positively appreciate.

In broad strokes requirements seem to depend on the system-under-test, in combination with the run-context .
(examples of a system-under-test (SUT): a website, or a single file shell program)
(examples of a run-context: local development, or "remote" continuous integration)

Some SUT might be sufficiently covered by something like a read-evaluate-print-loop (REPL),
while another SUT might require screenshot(s) from a rendered output of something like hypertext or a scalable vector graphic,
yet another SUT might require displaying specifics of multiple parallel processes to give insight into a certain problematic state.

The run-context can cause intertwined line-based/console output of separate contexts, for instance by using nodejs parallelism or by using another program such as pnpm to run several projects simultaneously, directing all output to a single stdout and/or stderr.

Regarding the navigating through a failed test (suite), that is: a human interpretation of generated output, other line-based output parsers seem to depend on the format of the start of the line (a 'header') to distinguish between process-/test-context in the case of intertwined output of parallel running processes/tests. With command line interfacing (CLI) programs, piping such outputs to display them in a group based on that 'header' seems useful for human interpretation. For me, such grouping of the output does not necessarily need to be an integral part of nodejs.

An attempt, that i know of, to create a standard for command line test output is called Test Anything Protocol (TAP) :
https://github.com/TestAnything/testanything.github.io/blob/master/tap-version-14-specification.md .
For which there seems to be a nodejs implementation (which i have not (yet) tried) over at https://github.com/tapjs/tapjs .

Creating an easier interpretation of (multiple intertwined) stack trace outputs seems to be a task for an integrated development environment (IDE). Sticking to a simpler IDE by only using CLI programs, providing (custom-built) test-reporters to nodejs seems to allow for filtering and grouping of the (end-of-line/)line based CLI output.

Although "This ping-pong style of wiring sounds brittle" ,
giving a child process knowledge of its parent process seems less desirable .

Say a parallel test suite is done from a shell script that spawns/runs a nodejs test but also a test written in another language (i guess this is what is named an orchestrator in a earlier message by molow). Given those two tests are started by that shell script, the decision for stopping the other test in case one of them generates a (specific) error should remain the responsibility of that shell script that started both processes . If the other test needs to be stopped, how it receives a signal to do so is a decision made by the creator of that (orchestrating) main shell script . Perhaps sending a termination-signal is the proper way to stop that other program, as that other program has a routine to clean up its activities when that signal is received .

Well that derailed here and there. Anyway, although it seems to me that you are better informed about how nodejs works, i still hope that some of my respanse might provide you with some inspiration on what to do with this issue .
And perhaps my attempt to make some sense of my inner ramblings can also be helpful to some other people who might read it .


TL;DR : yes , i think adding an option to exit-early is unnecessary .

@nicholaswmin
Copy link
Author

I see, these are all good points and much appreciated for taking the time to write them down.

I am with you on all points. In fact, I want more assertion matchers but I'm holding off on creating a PR for it because I am onboard with the minimalist take; I see no other way.

That being said, there should be a minimum feature set.

What's the goal of implementing a test-runner in NodeJS?

  • Make testing more accessible?
  • Negate the need for big dependencies?
  • Be able to say "hey we got a test runner too" even if it's clunky? (it's not just making a point)

So my point here is:
a) a mission statement for the test-runner is required.
b) if the above answer is A) then the issues should be grouped like so: zed-industries/zed#6952

and a standard "library" would be formed by implementing the cross-section of most reasonable + most requested.

For example:

  • End to End tests are notoriously clunky, require a headless driver and they are at the very bottom of the testing pyramid. Should NodeJS support them natively without external dependencies?

No, that sounds unreasonable.

What about unit-tests? They are the easiest to setup and he bare minimum stepping stone to testing in general?

These should be supported natively and the whole process should be ergonomic as well. No additional reading required.

From my perspective, when i see "hey we got a test runner", I assume there's the bare minimum of an ergonomic test runner.

@imme-emosol
Copy link

Figuring out some context here, i see the beginnings over at 432d1b5 , which reads :

test: add initial test module

This commit adds a new 'test' module that exposes an API
for creating JavaScript tests. As the tests execute, TAP
output is written to standard output. This commit only supports
executing individual test files, and does not implement
command line functionality for a full test runner.

pull request: #42325
Refs: #40954

Oh!, additionally in #42990 (comment) MoLow refers to an article by cjihrig explaining why the test-runner was added. Which you seem to have read though, as you mention that you "overwhelmingly agree with the minimalistic approach the test runner has to take" and are holding back on creating PRs for assertion matchers. That article also seems to give some insight into the balancing act the core nodejs team has to do.

Aside from that, there seems to be (too) little information that documents why node:test exists.


But the mentioned issue in 'refs' does provide more information, for instance, in a message from 28 Feb 2022 it reads :
[...]
Additional execution modes (dry run, fail fast, etc.)
[...]
"Fail fast" is called --bail or -b or --bailout in tap parlance.
[...]

And #43525 suggests that TAP is supported but that is (of course) only about the output.


Supporting --bail seems to sort of gotten stuck in the pull request mentioned by marco-ippolito after #48919 (comment) .

As far as i can tell, the thing missing in the output is the Bail out! part, but there also seems to be talk about emitting a test:bail-event, especially in relation to clearing the test-files queue when combining --bail with --watch.


So, it seems to me that

  • unit-tests are currently supported.
  • supporting --bail is awaiting a contribution
    (after a first contribution that delivered a more complete specification of what can be expected of --bail).

Whether out-of-the-box support for early-exit/bail is part of a 'bare minimum of an ergonomic test runner' can probably be argued for or against. Your comment of 29 Apr 2024 seems to align with cjihrig's comment on 1 Sep 2023. A further complication might be temporary changing the state of "bail", as mentioned in https://github.com/TestAnything/testanything.github.io/blob/master/tap-version-14-specification.md#pragmas (which, in turn, seem to be related to "WTH a bail() and timeout() function to migrate almost seamlessly from mocha would have been so great!"). In any case, it seems that there is not much to do but to wait for such a contribution, or to make the contribution happen. A search for code 'test:fail' seems like a rough indication of the amount of work for contributing a 'test:bail'.

Regarding your expectancy of a list like zed uses, this project seems to do issue triaging and/or prioritization in another way, but perhaps i am not understanding you well enough on why you would like to see such a list.

As the initial 'Spit out the failing test and its stack trace, then Exit 1 as soon as a test fails.' seems to be covered by the workaround/--test-reporter-option, the remaining issue seems to be about the implementation of emitting a 'test:bail' event across the board. Which seems to be currently in a state of "on hold"/"won't do" (not "rejected"), because main nodejs-contributors are investing their nodejs-time elsewhere.

@nicholaswmin
Copy link
Author

nicholaswmin commented Dec 15, 2024

sThanks for taking the time to summarize the above so comprehensively - this is very much appreciated and very valuable. theres not much to add here from my side. I'll just reiterate my current position a bit more in context, just in case - and that's all.

#42990 (comment)

BTW that is true for me, I come from a mocha background as well therefore the globalHooks and this particular issue are the only missing lego pieces.
At this point it might sound as If I want to have the tools and features i am personally missing.

That's true to an extent, obviously, but I try to be cognizant of this bias.
I stand by my ergonomics argument as a generic sensible feature; I do not feel the same way about globalHooks which I also often need; and i certainly I'm against adding more assertion methods even though I always need them.

My reasoning is that testing is a "meta-task" and one of the few parts where extra attention into its ergonomics are justified. Test suites are unique in that they run an ever-present risk of being abandoned if they become too clunky.

If this sounds like im contradicting my own claim about agreeing with a minimalist test runner, I'll just clarify that I think "minimalist" in terms of "use-cases to cover".

In essence, don't attempt to please everyone (maybe dont even attempt to please a majority) but that bare minimum case you will cover should be "the right thing" and the right thing is a 100% frictionless experience start-to-finish, provided the implementation isn't overly complex. I appreciate this is very opinionated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. test_runner Issues and PRs related to the test runner subsystem.
Projects
Status: Awaiting Triage
Development

No branches or pull requests

5 participants