Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate lint steps in GitHub Actions #231

Merged
merged 1 commit into from
Feb 29, 2024

Conversation

brandur
Copy link
Contributor

@brandur brandur commented Feb 25, 2024

Like with running Go tests, the lints for each separate Go module must
be running separately. This is currently being accomplished in CI by
giving each Go module its own lint job using golangci-lint's GitHub
Action.

Unfortunately, this is really not working out well. Although the lints
themselves are all very fast to run (just a couple seconds), the post
run step where it saves it cache is quite slow, about two minutes, and
that's happening for each of the lint steps, making the entire lint job
take almost ten minutes to run.

I think what might've been happening is that each lint job was
overwriting the last job's cache, which is why each post run step seemed
to be doing so much work. I didn't validate this hypothesis, but it
seems like a strong possibility.

Here, try to hack around the problem by having the main lint job fetch
golangci-lint, but then only run --help, and then do the real linting
as a separate step (one that doesn't use the GitHub Action) that calls
into our make lint target to run lints for each Go module.

A downside is that it may not annotate lint problems on the GitHub PR,
which is something that theoretically happened before, although it never
seemed to work for me. This might not be that bad though because it
could as a side effect improve the log output, which is terrible for the
GitHub Action step because it doesn't include files or line numbers.

Also, I notice that the lints/tests in Makefile had been commented out
for ./cmd/river for some reason (maybe something I did during the
driver refactor and forgot to fix), so I uncommented them.

[1] golangci/golangci-lint-action#271

@brandur brandur force-pushed the brandur-consolidate-lint-steps branch from 416604d to 8cb492c Compare February 25, 2024 06:32
@brandur brandur requested a review from bgentry February 25, 2024 06:35
@brandur
Copy link
Contributor Author

brandur commented Feb 25, 2024

@bgentry Was seeing lint jobs take almost ten minutes to run — totally nuts. This change fixes that as far as I can tell.

@brandur brandur force-pushed the brandur-consolidate-lint-steps branch 3 times, most recently from 469fa22 to 27ee4dd Compare February 25, 2024 06:55
@bgentry
Copy link
Contributor

bgentry commented Feb 25, 2024

Was seeing lint jobs take almost ten minutes to run — totally nuts.

Where are you seeing this? this latest master run shows ~40 seconds.

@bgentry
Copy link
Contributor

bgentry commented Feb 25, 2024

A downside is that it may not annotate lint problems on the GitHub PR,
which is something that theoretically happened before, although it never
seemed to work for me.

I would actually like to get this working, and based on the docs it seems we merely need to add a checks: write permission to make it work?

I think what might've been happening is that each lint job was
overwriting the last job's cache, which is why each post run step seemed
to be doing so much work. I didn't validate this hypothesis, but it
seems like a strong possibility.

I'm not seeing anything like this from the latest run on master. It looks like each lint step is trying to save the same cache key, and sometimes failing to do so because it already exists.

If you can point me to wherever you saw evidence of this being super slow, I'm happy to debug—I've gotten pretty skilled with Actions from work stuff.

Like with running Go tests, the lints for each separate Go module must
be running separately. This is currently being accomplished in CI by
giving each Go module its own lint job using golangci-lint's GitHub
Action.

Unfortunately, this is really not working out well. Although the lints
themselves are all very fast to run (just a couple seconds), the post
run step where it saves it cache is quite slow, about two minutes, and
that's happening for each of the lint steps, making the entire lint job
take almost _ten minutes_ to run.

I think what might've been happening is that each lint job was
overwriting the last job's cache, which is why each post run step seemed
to be doing so much work. I didn't validate this hypothesis, but it
seems like a strong possibility.

Here, try to hack around the problem by having the main lint job fetch
golangci-lint, but then only run `--help`, and then do the real linting
as a separate step (one that doesn't use the GitHub Action) that calls
into our `make lint` target to run lints for each Go module.

A downside is that it may not annotate lint problems on the GitHub PR,
which is something that theoretically happened before, although it never
seemed to work for me. This might not be that bad though because it
could as a side effect improve the log output, which is terrible for the
GitHub Action step because it doesn't include files or line numbers.

Also, I notice that the lints/tests in `Makefile` had been commented out
for `./cmd/river` for some reason (maybe something I did during the
driver refactor and forgot to fix), so I uncommented them.

[1] golangci/golangci-lint-action#271
@brandur brandur force-pushed the brandur-consolidate-lint-steps branch from 27ee4dd to e486e60 Compare February 25, 2024 23:25
@brandur
Copy link
Contributor Author

brandur commented Feb 25, 2024

I would actually like to get this working, and based on the docs it seems we merely need to add a checks: write permission to make it work?

Ah yeah, that could be. I have to say though that I have it enabled it in another repo, and IMO it's not that great — you get failures in there, but you still have to scroll through pages of diff trying to zero in on them. Not hard against it, but the CLI output is easier to track down in my experience.

I'm not seeing anything like this from the latest run on master. It looks like each lint step is trying to save the same cache key, and sometimes failing to do so because it already exists.

If you can point me to wherever you saw evidence of this being super slow, I'm happy to debug—I've gotten pretty skilled with Actions from work stuff.

Cool. It may not be happening in master, but here's the runs from the four PRs that I submitted most recently (you should be able to repro easily, although currently you'd have to open a PR for CI to run since it only starts a run on master pushes that are not in a PR):

I think some of the step timing on those might be retrospectively wrong. Each one is showing two of the lint post actions as 2+ minutes, but I think all of them basically where. Take a look at the total run time of each, and they're all over > 9 minutes, which is what I was experiencing yesterday.

@bgentry
Copy link
Contributor

bgentry commented Feb 26, 2024

Ah, I've seen that timing issue before and I think it's the result of something within GitHub Actions or the runner malfunctioning. Check out the timestamps on the logs from these steps:

Screenshot 2024-02-26 at 8 15 53 AM

The logs indicate a duration of about 3s, which is what the cache save step says it took. And yet, the overall step takes >2 min.

That being said, there is definitely something wrong with that golangci-lint action and the way we're using it for many packages in one job—it doesn't appear to be designed to handle this. Note all the tar errors on the lint steps: it's restoring a Go + module cache on each one of those. May not add up to much time in normal usage, but it's still wasteful and unnecessary. Let me take a crack at an alternate way of fixing this.

@brandur
Copy link
Contributor Author

brandur commented Feb 26, 2024

Ah, I've seen that timing issue before and I think it's the result of something within GitHub Actions or the runner malfunctioning. Check out the timestamps on the logs from these steps:

Ah yeah, good point — it does seem like the golangci-lint step was doing roughly the right thing despite the duration. (Although I was watching them at the time and they really did take 2+ min each, so who knows what's going on. A GitHub Actions bug wouldn't surprise me.)

That being said, there is definitely something wrong with that golangci-lint action and the way we're using it for many packages in one job—it doesn't appear to be designed to handle this. Note all the tar errors on the lint steps: it's restoring a Go + module cache on each one of those. May not add up to much time in normal usage, but it's still wasteful and unnecessary. Let me take a crack at an alternate way of fixing this.

Yeah, I looked through a lot of their conf, and it really does seem like there's only one of these actions meant to be used per job. golangci/golangci-lint-action#271 has a few other ideas on how to handle multiple directories, but this is the one approach out of those that seemed to work well for me. SG if you want to take a look though.

@bgentry
Copy link
Contributor

bgentry commented Feb 29, 2024

I had a bad run of this here. I even cancelled the workflow while it was sitting inexplicably on one of these steps, and it kept going for many minutes longer (through multiple post-cache steps with the same issue).

Copy link
Contributor

@bgentry bgentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine as a temporary measure until I have time to look into how to fix this more cleanly

@brandur
Copy link
Contributor Author

brandur commented Feb 29, 2024

Awesome. Spoke with @bgentry about this on Slack, and we may end up reverting most of this, but it gets us faster build in the interim while we're chasing down a few other high priority items.

@brandur brandur merged commit 0ace41d into master Feb 29, 2024
10 checks passed
@brandur brandur deleted the brandur-consolidate-lint-steps branch February 29, 2024 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants