-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Builds on buildkite are hanging on remote-cache after 0.23 #7555
Comments
It seems like all the hangs are on the C++ actions. Both compiling and linking. But it can be a red herring, maybe C++ actions are just first to be executed. |
In an attempt to resume at least some operation with bazelbuild/bazel#7555, I'm disabling remote caching on all workers.
I'm trying whether disabling remote cache (bazelbuild/continuous-integration@c6131a7) will help. I will now cancel some hanging builds to make workers available. |
It seems that downloading from remote never gives a timeout for the affected actions. Is 285c03e#diff-b1b932f73f8ce938510d06b752945cb1 related? What's special about the C++ actions? Do they produce larger artifacts? |
I disabled remote cache on the ci, queued builds are now running, presubmits should be working. I'm not sure mac presubmits will finish in under 1 hour without remote cache though. |
They don't produce large artifacts. They are not spawn actions, so maybe they behave in a non standard way. Or it's a red herring. |
Hmm,, I couldn't reproduce the failure on Ubuntu 1404 in a docker running on a CI Linux slave. |
It is not always failing with 0.23.0, for example in one presubmit: |
Hmm, that could mean it was a temporary GCS outage and all is fine now... |
Well, it should still not hang forever but run into a timeout. |
Absolutely. |
But this bug might also exist in previous versions? So maybe not a regression. @buchgr |
It seems to work fine again on testing pipelines. I ll re-enable remote caching for presubmits again.
The reason it can't really be a GCS outage is that mac presubmits don't use GCS. |
As of now, I see no reason for a patch release because I don't know what exactly the bug is. I was not able to reproduce it. |
I found a way to reproduce the hanging symptom (probably it's the same bug)
Then the actions will still try to use remote caching but end up hanging. |
@laurentlb this will need a patch release. I am sending out a fix and will update this bug once submitted |
…d#7555 when using --remote_(http)_cache we wouldn't properly reset the state on the bazel server and so on subsequent command invocations the server would still think it's using remote caching. this would lead for bazel to hang indefinitely.
This commit unblocks CI by avoiding the bad release 0.23.0: bazelbuild/bazel#7555 bazelbuild/bazel#7555
This commit unblocks CI by avoiding the bad release 0.23.0: bazelbuild/bazel#7555
when using --remote_(http)_cache we wouldn't properly reset the state on the bazel server and so on subsequent command invocations the server would still think it's using remote caching. this would lead for bazel to hang indefinitely. Closes #7562. PiperOrigin-RevId: 235914044
when using --remote_(http)_cache we wouldn't properly reset the state on the bazel server and so on subsequent command invocations the server would still think it's using remote caching. this would lead for bazel to hang indefinitely. Closes #7562. PiperOrigin-RevId: 235914044
when using --remote_(http)_cache we wouldn't properly reset the state on the bazel server and so on subsequent command invocations the server would still think it's using remote caching. this would lead for bazel to hang indefinitely. Closes #7562. PiperOrigin-RevId: 235914044
Baseline: 441fd75 Cherry picks: + 6ca7763: Fix a typo + 2310b1c: Ignore SIGCHLD in test setup script + f9eb1b5: Complete channel initialization in the event loop + f0a1597: remote: properly reset state when using remote cache. Fixes #7555 + 56366ee: Set non-empty values for msvc_env_* when VC not installed Release 0.23.2
In an attempt to resume at least some operation with bazelbuild/bazel#7555, I'm disabling remote caching on all workers.
This commit unblocks CI by avoiding the bad release 0.23.0: bazelbuild/bazel#7555
All the builds (e.g. presubmits) on Buildkite are hanging sith 'remote-cache' status with Bazel 0.23 (example). I see hangs on mac and windows as well.
Builds with 0.22 are still passing (example).
Testing 0.23rc3 with downstream (which include bazel and its tests) was green.
The text was updated successfully, but these errors were encountered: