Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky tests in GitHub actions #265

Closed
willmurphyscode opened this issue Dec 22, 2023 · 8 comments
Closed

flaky tests in GitHub actions #265

willmurphyscode opened this issue Dec 22, 2023 · 8 comments
Assignees

Comments

@willmurphyscode
Copy link
Contributor

The unit tests for this repo sometimes fail with an error like this:

spawn ETXTBSY

      at ToolRunner.<anonymous> (node_modules/@actions/exec/src/toolrunner.ts:443:24)
      at node_modules/@actions/exec/lib/toolrunner.js:27:71
      at Object.<anonymous>.__awaiter (node_modules/@actions/exec/lib/toolrunner.js:23:12)
      at node_modules/@actions/exec/src/toolrunner.ts:419:58
      at ToolRunner.<anonymous> (node_modules/@actions/exec/src/toolrunner.ts:419:12)
      at fulfilled (node_modules/@actions/exec/lib/toolrunner.js:24:58)

(link)

I believe this is because the tests run simultaneously, but runGrype is not threadsafe.

scan-action/index.js

Lines 31 to 35 in 52d017b

let grypePath = cache.find(grypeBinary, version);
if (!grypePath) {
// Not found, install it
grypePath = await downloadGrype(version);
}

This has a race condition, since whatever is present in the cache may be changed by one test while another test is checking it. I've also seen ENOENT in test runs.

@willmurphyscode
Copy link
Contributor Author

We might be able to get away with just running one test at a time as a cheap way to fix this:

With maxConcurrency = 1, we see npm run test 11.77s user 6.00s system 33% cpu 53.132 total. But nearly 50 seconds of that is downloading the db (which only happens once regardless of test parallelism).

I'll see if maxConcurrency = 1 fixes this.

@popey
Copy link
Contributor

popey commented Jul 8, 2024

Forgive the possibly ill-informed comment here. Would it be possible to pre-cache the grype-db in an early step before we kick off the further steps that may depend on it?

@willmurphyscode
Copy link
Contributor Author

I think maybe we want to use this jest option to keep the tests from racing to install grype: https://jestjs.io/docs/cli#--runinband

@popey I believe the race condition occurs installing grype itself, not downloading the grype-db. Your suggestion still stands, and that might be the way forward, but I think given how quick the tests are, running the serially is probably a quicker fix and we'll try that first.

@kzantow
Copy link
Contributor

kzantow commented Jul 8, 2024

I think it should already be set to run serially: https://github.com/anchore/scan-action/blob/main/jest.config.js#L4

@willmurphyscode
Copy link
Contributor Author

@willmurphyscode
Copy link
Contributor Author

https://github.com/anchore/scan-action/actions/runs/9864003440/job/27238006566#step:8:131 is another example of the spawn: ETXTBSY flake.

@willmurphyscode
Copy link
Contributor Author

@willmurphyscode
Copy link
Contributor Author

Both scan-action and sbom-action have their tests running in series now. We can re-open this if the issue returns.

@github-project-automation github-project-automation bot moved this from In Review to Done in OSS Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants