Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky "Build from tarball" workflow on GitHub Actions #33947

Closed
mmarchini opened this issue Jun 18, 2020 · 10 comments
Closed

Flaky "Build from tarball" workflow on GitHub Actions #33947

mmarchini opened this issue Jun 18, 2020 · 10 comments
Labels
build Issues and PRs related to build files or the CI. flaky-test Issues and PRs related to the tests with unstable failures on the CI.

Comments

@mmarchini
Copy link
Contributor

This has been going on for at least a month, and seems to be the major source of flakiness on our Actions runs. Sometimes the build-tarball job will fail with no logs after 40+ minutes:

image

(https://github.com/nodejs/node/runs/784411094?check_suite_focus=true)

Sometimes it will fail with an error like below:

rm 664a568854abba2b997909662a528594c7526386.intermediate e79088d28d3209bde653372a7712225ba4f65b2e.intermediate 495122784be95331de6389fc8588b7b3a0594912.intermediate ca02573a617ca00970b9a2e674797e70460dac89.intermediate
if [ ! -r node -o ! -L node ]; then ln -fs out/Release/node node; fi
##[error]Process completed with exit code 2.

(https://github.com/nodejs/node/runs/784761789?check_suite_focus=true)

As a consequence, the other jobs in the tarball workflow won't run.

@mmarchini mmarchini added build Issues and PRs related to build files or the CI. flaky-test Issues and PRs related to the tests with unstable failures on the CI. labels Jun 18, 2020
@richardlau
Copy link
Member

Sometimes it will fail with an error like below:

rm 664a568854abba2b997909662a528594c7526386.intermediate e79088d28d3209bde653372a7712225ba4f65b2e.intermediate 495122784be95331de6389fc8588b7b3a0594912.intermediate ca02573a617ca00970b9a2e674797e70460dac89.intermediate
if [ ! -r node -o ! -L node ]; then ln -fs out/Release/node node; fi
##[error]Process completed with exit code 2.

(https://github.com/nodejs/node/runs/784761789?check_suite_focus=true)

It’s lost among all the output (the UI for browsing logs isn’t great) but this run was an actual failure:

/Users/runner/runners/2.263.0/work/node/node/tools/doc/allhtml.js:87
2020-06-18T15:08:02.2260160Z   if (!ids.has(match[1])) throw new Error(`link not found: ${match[1]}`);
2020-06-18T15:08:02.2339310Z                           ^
2020-06-18T15:08:02.2339780Z 
2020-06-18T15:08:02.2340130Z Error: link not found: tls_server_addcontext_hostname_context
2020-06-18T15:08:02.2340480Z     at Object.<anonymous> (/Users/runner/runners/2.263.0/work/node/node/tools/doc/allhtml.js:87:33)
2020-06-18T15:08:02.2340840Z     at Module._compile (internal/modules/cjs/loader.js:1138:30)
2020-06-18T15:08:02.2341190Z     at Object.Module._extensions..js (internal/modules/cjs/loader.js:1158:10)
2020-06-18T15:08:02.2341570Z     at Module.load (internal/modules/cjs/loader.js:986:32)
2020-06-18T15:08:02.2352730Z     at Function.Module._load (internal/modules/cjs/loader.js:879:14)
2020-06-18T15:08:02.2379940Z     at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
2020-06-18T15:08:02.2417890Z     at internal/main/run_main_module.js:17:47
2020-06-18T15:08:02.2419920Z make[1]: *** [out/doc/api/all.html] Error 1
2020-06-18T15:08:02.2436820Z make[1]: *** Waiting for unfinished jobs....
2020-06-18T15:08:02.2437900Z make: *** [doc-only] Error 2
2020-06-18T15:08:02.2444480Z make: *** Waiting for unfinished jobs....

@mmarchini
Copy link
Contributor Author

Good, only one flaky failure left :)

@richardlau
Copy link
Member

If there are no logs (the raw log doesn't even contain the output from the steps that completed successfully) it sounds more like a failure on GitHub's side.

@mmarchini
Copy link
Contributor Author

It does, but it only happens on that specific job and in happens quite frequently. There might be something we can do to fix or mitigate it.

@codebytere
Copy link
Member

Have we tried running with ACTION_STEP_DEBUG set to true? the logs will be a bit hairier but it might give us a hint

@richardlau
Copy link
Member

Have we tried running with ACTION_STEP_DEBUG set to true? the logs will be a bit hairier but it might give us a hint

No we haven't. I believe it will require a repository admin (i.e. TSC member) to add the secret to this repository: https://help.github.com/en/actions/configuring-and-managing-workflows/managing-a-workflow-run#enabling-step-debug-logging

@mmarchini
Copy link
Contributor Author

Doesn't seem like we're able to set it for a specific workflow though. Would there be any unwanted side effects with it (performance impact, leaked secret variables, etc.)?

@codebytere
Copy link
Member

The DX for actions debug is not great still - secrets set properly will all show up starred out (********) but given that it's not scopable per-action i'd say that our best bet should we do it is to turn it on, trigger a run or two to see the issue, pull the raw logs to dig through, and then turn it back off 🤔

@MylesBorins
Copy link
Contributor

TBH this test is fairly wasteful. Should we maybe move it to be cron based?

@mmarchini
Copy link
Contributor Author

I think this was fixed? If not feel free to reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Issues and PRs related to build files or the CI. flaky-test Issues and PRs related to the tests with unstable failures on the CI.
Projects
None yet
Development

No branches or pull requests

4 participants