Better handling of stray jobs that need terminating in CI #20194
Labels
build
Issues and PRs related to build files or the CI.
flaky-test
Issues and PRs related to the tests with unstable failures on the CI.
This has recently started happening a lot in the macOS hosts in CI:
Some test times out. For example, in https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx1010/17983/console,
sequential/test-benchmark-http
times out.As a result, a stray subprocess is left that ends up causing subsequent jobs to fail. So, for example, https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx1010/17988/console:
To fix this, someone from the Build WG (in this specific case, me) logs in and does a
kill -9
on the PID. In theory, the PID should have been terminated by one of the instances ofxargs kill
that appears in theMakefile
. My guess (that I keep forgetting to test when this comes up) is that the problem is thatxargs kill
needs to bexargs kill -9
to be effective in these cases on the macOS hosts.The text was updated successfully, but these errors were encountered: