Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_runner: bootstrap reporters before running tests #46737

Merged
merged 1 commit into from
Feb 21, 2023

Conversation

MoLow
Copy link
Member

@MoLow MoLow commented Feb 19, 2023

Fixes: #46747

I believe this will address these test failures:
https://ci.nodejs.org/job/node-test-binary-windows-js-suites/19156/

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/test_runner

@MoLow MoLow added flaky-test Issues and PRs related to the tests with unstable failures on the CI. fast-track PRs that do not need to wait for 48 hours to land. test_runner Issues and PRs related to the test runner subsystem. labels Feb 19, 2023
@github-actions

This comment was marked as outdated.

@nodejs-github-bot nodejs-github-bot added dont-land-on-v14.x needs-ci PRs that need a full CI run. labels Feb 19, 2023
@MoLow MoLow added the request-ci Add this label to start a Jenkins CI on a PR. label Feb 19, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 19, 2023
@nodejs-github-bot
Copy link
Collaborator

@MoLow
Copy link
Member Author

MoLow commented Feb 19, 2023

@nodejs/test_runner this only solves one issue, there seems to be another issue with the process not being alive long enough to pipe the entire stream, see
https://ci.nodejs.org/job/node-test-commit-plinux/nodes=rhel8-ppc64le/47011/
and #22088 (comment)

not ok 9 - should support a 'js' file as a custom reporter
21:45:52           ---
21:45:52           duration_ms: 1125.36918
21:45:52           failureType: 'testCodeFailure'
21:45:52           error: |-
21:45:52             The input did not match the regular expression /{"test:start":4,"test:pass":2,"test:fail":2,"test:plan":2,"test:diagnostic":\d+}$/. Input:
21:45:52             
21:45:52             'custom.js {"test:start":1}'
21:45:52             
21:45:52           code: 'ERR_ASSERTION'
21:45:52           actual: 'custom.js {"test:start":1}'
21:45:52           operator: 'match'
21:45:52           stack: |-
21:45:52             Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-plinux/nodes/rhel8-ppc64le/test/parallel/test-runner-reporters.js:93:14)
21:45:52             ItTest.runInAsyncScope (node:async_hooks:206:9)
21:45:52             ItTest.run (node:internal/test_runner/test:552:25)
21:45:52             ItTest.start (node:internal/test_runner/test:468:17)
21:45:52             node:internal/test_runner/test:811:71
21:45:52             node:internal/per_context/primordials:470:82
21:45:52             new Promise (<anonymous>)
21:45:52             new SafePromise (node:internal/per_context/primordials:438:29)
21:45:52             node:internal/per_context/primordials:470:9
21:45:52             Array.map (<anonymous>)

I am trying to work on the additional issue on a separate PR

@nodejs-github-bot
Copy link
Collaborator

@cjihrig
Copy link
Contributor

cjihrig commented Feb 19, 2023

Should we wait for the child process's 'close' event instead of 'exit'?

@MoLow
Copy link
Member Author

MoLow commented Feb 19, 2023

Should we wait for the child process's 'close' event instead of 'exit'?

once the child process is not alive, transforming the output asynchronously is not enough for keeping the main process alive

I have solved that like this:

@@ -123,13 +125,16 @@ function setup(root) {
   const rejectionHandler =
     createProcessEventHandler('unhandledRejection', root);
   const coverage = configureCoverage(root);
-  const exitHandler = () => {
+  const exitHandler = async () => {
     root.coverage = collectCoverage(root, coverage);
     root.postRun(new ERR_TEST_FAILURE(
       'Promise resolution is still pending but the event loop has already resolved',
       kCancelledByParent));
 
     hook.disable();
+    const handle = setInterval(() => {}, 1000);
+    await finished(root.reporter);
+    clearInterval(handle);
     process.removeListener('unhandledRejection', rejectionHandler);
     process.removeListener('uncaughtException', exceptionHandler);
   };

but that is kind of a hack

@MoLow
Copy link
Member Author

MoLow commented Feb 20, 2023

if anyone wants to tackle this it reproduces running tools/test.py -J test/parallel/test-runner-reporters.js --repeat 1000

@cjihrig
Copy link
Contributor

cjihrig commented Feb 20, 2023

it reproduces running

Interestingly, I haven't been able to get it to reproduce on my machine (M1 mac).

but that is kind of a hack

Definitely a hack, but if there is no way to keep the process alive, I'd be OK with doing something like that, but I think it should exist in run() and the interval period should be large.

@MoLow
Copy link
Member Author

MoLow commented Feb 21, 2023

Interestingly, I haven't been able to get it to reproduce on my machine (M1 mac).

Mine is a M1 mac as well 😕

@MoLow
Copy link
Member Author

MoLow commented Feb 21, 2023

I think it should exist in run()

@cjihrig I have tried that but it creates a deadlock - adding a timer prevents the process from reaching beforeExit since the event loop hasn't ended,
but the reporter/s are not ended/drained yet since root.postRun is only called within beforeExit
WDYT?

@MoLow MoLow force-pushed the wait-for-parser-to-finish branch from afc52f4 to 6a24768 Compare February 21, 2023 06:42
@MoLow MoLow requested review from benjamingr and cjihrig February 21, 2023 06:43
Comment on lines 653 to 656
// In case the event loop has ended and reporter has not drained,
// we use a timer to keep the process alive until the reporter is done.
const handle = setInterval(() => {}, TIMEOUT_MAX);
this.reporter.once('close', () => clearInterval(handle));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the stream still working but not keeping the process alive? Is there a resource somewhere that we unref()d ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #22088 (comment)
the reporters transform the output after the child process are done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, do we have any idea which change introduced the current flakiness? After the CI was unlocked from the security release, a flurry of changes landed and then this test became flaky. Are we able to bisect it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am bisecting now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjihrig according to git bisect flakiness started after b4a962d landed, so it would have probably become flaky even without the latest test runner changes.
CC @debadree25 @nodejs/streams

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, investigating then, could you guide me if at any place in the test_runner code where pipeline maybe being used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the timing condition we might be facing is within setupTestReporters
where getReportersMap is fulfilled after all the tests completed running

async function setupTestReporters(testsStream) {
const { reporters, destinations } = parseCommandLine();
const reportersMap = await getReportersMap(reporters, destinations);
for (let i = 0; i < reportersMap.length; i++) {
const { reporter, destination } = reportersMap[i];
compose(testsStream, reporter).pipe(destination);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah understood, thank you checking this

Copy link
Member Author

@MoLow MoLow Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record: I think the issue is in the test runner, not in change performed to streams

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far no success in thinking with reference to b4a962d only think that comes to mind is if anyway testsStream ended before somehow? but that also doesn't hold up much 😕 nonetheless will try explore a little more just incase

@MoLow MoLow removed the fast-track PRs that do not need to wait for 48 hours to land. label Feb 21, 2023
@MoLow MoLow force-pushed the wait-for-parser-to-finish branch from 6a24768 to 6ea5d9c Compare February 21, 2023 18:00
@MoLow MoLow removed the request for review from benjamingr February 21, 2023 18:00
Copy link
Contributor

@cjihrig cjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if the CI passes - definitely nicer than adding a keep alive timer.

doc/api/test.md Outdated Show resolved Hide resolved
@MoLow MoLow force-pushed the wait-for-parser-to-finish branch from d92b981 to d326348 Compare February 21, 2023 18:10
@MoLow MoLow added request-ci Add this label to start a Jenkins CI on a PR. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Feb 21, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 21, 2023
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@MoLow MoLow added the commit-queue Add this label to land a pull request using GitHub Actions. label Feb 21, 2023
@nodejs-github-bot nodejs-github-bot added commit-queue-failed An error occurred while landing this pull request using GitHub Actions. and removed commit-queue Add this label to land a pull request using GitHub Actions. labels Feb 21, 2023
@nodejs-github-bot
Copy link
Collaborator

Commit Queue failed
- Loading data for nodejs/node/pull/46737
✔  Done loading data for nodejs/node/pull/46737
----------------------------------- PR info ------------------------------------
Title      test_runner: bootstrap reporters before running tests (#46737)
Author     Moshe Atlow  (@MoLow)
Branch     MoLow:wait-for-parser-to-finish -> nodejs:main
Labels     flaky-test, author ready, needs-ci, dont-land-on-v14.x, test_runner
Commits    1
 - test_runner: bootstrap reporters before running tests
Committers 1
 - Moshe Atlow 
PR-URL: https://github.com/nodejs/node/pull/46737
Fixes: https://github.com/nodejs/node/issues/46747
Reviewed-By: Benjamin Gruenbaum 
Reviewed-By: Colin Ihrig 
------------------------------ Generated metadata ------------------------------
PR-URL: https://github.com/nodejs/node/pull/46737
Fixes: https://github.com/nodejs/node/issues/46747
Reviewed-By: Benjamin Gruenbaum 
Reviewed-By: Colin Ihrig 
--------------------------------------------------------------------------------
   ℹ  This PR was created on Sun, 19 Feb 2023 19:16:17 GMT
   ✔  Approvals: 2
   ✔  - Benjamin Gruenbaum (@benjamingr): https://github.com/nodejs/node/pull/46737#pullrequestreview-1308076791
   ✔  - Colin Ihrig (@cjihrig) (TSC): https://github.com/nodejs/node/pull/46737#pullrequestreview-1307932394
   ✖  GitHub CI is still running
   ℹ  Last Full PR CI on 2023-02-21T19:30:15Z: https://ci.nodejs.org/job/node-test-pull-request/49848/
- Querying data for job/node-test-pull-request/49848/
   ✔  Last Jenkins CI successful
--------------------------------------------------------------------------------
   ✔  Aborted `git node land` session in /home/runner/work/node/node/.ncu
https://github.com/nodejs/node/actions/runs/4236769504

MoLow added a commit that referenced this pull request Feb 21, 2023
PR-URL: #46737
Fixes: #46747
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@MoLow MoLow merged commit b6f5a67 into nodejs:main Feb 21, 2023
@MoLow
Copy link
Member Author

MoLow commented Feb 21, 2023

Landed in ce49f79

@MoLow MoLow deleted the wait-for-parser-to-finish branch February 21, 2023 21:14
MoLow added a commit to MoLow/node that referenced this pull request Feb 25, 2023
PR-URL: nodejs#46737
Fixes: nodejs#46747
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MoLow added a commit to MoLow/node that referenced this pull request Feb 25, 2023
PR-URL: nodejs#46737
Fixes: nodejs#46747
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@MoLow MoLow added backport-open-v18.x Indicate that the PR has an open backport. and removed commit-queue-failed An error occurred while landing this pull request using GitHub Actions. labels Feb 25, 2023
MoLow added a commit to MoLow/node that referenced this pull request Feb 25, 2023
PR-URL: nodejs#46737
Fixes: nodejs#46747
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
juanarbol pushed a commit that referenced this pull request Mar 3, 2023
PR-URL: #46737
Backport-PR-URL: #46839
Fixes: #46747
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@juanarbol juanarbol mentioned this pull request Mar 3, 2023
juanarbol pushed a commit that referenced this pull request Mar 5, 2023
PR-URL: #46737
Backport-PR-URL: #46839
Fixes: #46747
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
targos pushed a commit that referenced this pull request Mar 13, 2023
PR-URL: #46737
Fixes: #46747
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
targos pushed a commit that referenced this pull request Mar 14, 2023
PR-URL: #46737
Fixes: #46747
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. backport-open-v18.x Indicate that the PR has an open backport. flaky-test Issues and PRs related to the tests with unstable failures on the CI. needs-ci PRs that need a full CI run. test_runner Issues and PRs related to the test runner subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parallel/test-runner-reporters is flaking
6 participants