-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uploading artifacts from GH actions consistently fails with 503 error #4185
Comments
Interestingly, re-running the failed jobs succeeds, and the last artifact size in a successful build is 'only' 104579386 bytes. |
Also it seems no tests are run for cpp builds... very weird |
Notably, this builds failing wasn't the case always. It seems to have started happening more consistently in recent times. Has anything substantial changes in the past few weeks that could correlate with the failures? Digging into a few failed builds, the error is not always consistent either - 400 and 503. But the errors are always network related and so rebuilds succeeding isn't surprising or unexpected.
Size wouldn't matter here. We have other builds producing and uploading artifacts that are over 3GB. antlr doesn't generate anywhere close to that size. Also, the size of the artifact uploaded vs. files on disk will be different because the uploading action zips them.
Every build is running on a pristine VM machine. There is no pollution. If the sizes are different across builds than the generated file sizes on disks are different. How, why, which - those are questions we can follow up on. But VM pollution is not an issue.
No tests for the cpp builds is intentional. cpp natives are built twice once using the cmake directly (i.e. not using the java wrappers) so the warnings/errors can be captured. Tests are not a concern here with these builds. They are being run as part of the other builds. I will investigate further to narrow down the root cause of the failure. |
I hope this explains it - actions/upload-artifact#270 The failures started happening when I upgraded the specific Github action from v2. to v3 on 11/27/2022. I will create a new PR with the recommended fix for the issue. |
Thanks for this. |
Ah I get it now, the cpp is for building the lib, and then the regular job uses it for testing. And the segregation is for building using different 'flavors'... thanks. |
May be the jobs can use some renaming to drive the intent home. Any suggestions? |
build-cpp-library ? |
Github action for upload was upgraded to v3 recently and the release is unstable causing too many uploads to fail. Reverting that change to go back to using v2. Unfortunately, this change also downgrades use of Node.js to 12 which is deprecated, generating too many warnings in build output. Favoring warnings over failed builds. Signed-off-by: HS <hs@apotell.com>
If this is truly a network issue, should we not report this to GitHub? |
I've had a ton of network errors with Github Actions in grammars-v4. It was particularly bad for the Mac servers, which I believe are sub-par hardware (but there's no /proc/cpuinfo, and arch and uname -a don't give squat). To get around all the network mess, I had to write code to do builds with retries. I also try to avoid certain times of the day with some big PRs. (Eventually, the only thing that really, really fixed the problem was to make the builds only work on the changed grammars, so the network wasn't being pounded to death by all the simulaneous builds. I can only guess that Github probably virtualizes multiple machines on one piece of hardware, which still has only one shared network link. Your workflow spawns 33 builds!) I looked at the code for upload-artifact. The error is raised here. Perhaps you could fork a copy, create your own "antlr-upload-archive", and employ a retry of the crappy retry. Maybe if you retry a good number, things might eventually work. Unfortunately, the toolkit hardwires the retry count to 5, and does not offer an API to modify the value. There was some issue somewhere in github actions that mentioned that last "chunk" was having problems. Maybe this is it? But, you don't do an "ls -l *.tgz" in the "Prepare artifacts" step to know how big the file really is, and whether the last chunk is being sent. |
Yes, it is a network issue but not a Github issue. This seems to be somehow related to implementation of the upload-artifact action itself. This worked in the previous version but fails with the latest version. You can follow the bug report I pointed out on the upload-artifact repository. Unfortunately, this is not the only report issue about this problem. This issue has been reported numerous times with no resolution. I am unsure if I want to fork/clone the repository and take ownership of it. Neither I have the time to maintain it nor see an immediate need for it. If this continues to be a problem there are other actions similar to this one that we can use. I introduced a PR with the version rollback, however, that also failed with similar problem. Will try other options to see if I can swap the action for something more reliable. |
Github action for upload was upgraded to v3 recently and the release is unstable causing too many uploads to fail. Downgrading back to previous version hasn't made significant improvement either. Since the artifacts aren't exactly used by any chained job, failures for uploading the artifact can be ignored. The artifacts are used mostly for the purpose for debugging and so if needed the user can trigger specific build again to get the artifact. Signed-off-by: HS <hs@apotell.com>
Since the artifacts are not necessary, how about disabling that step altogether ? |
The option to continue-on-error has the same effect - ignoring the result if the upload fails. |
Github action for upload was upgraded to v3 recently and the release is unstable causing too many uploads to fail. Downgrading back to previous version hasn't made significant improvement either. Since the artifacts aren't exactly used by any chained job, failures for uploading the artifact can be ignored. The artifacts are used mostly for the purpose for debugging and so if needed the user can trigger specific build again to get the artifact. Signed-off-by: HS <hs@apotell.com>
Github action for upload was upgraded to v3 recently and the release is unstable causing too many uploads to fail. Downgrading back to previous version hasn't made significant improvement either. Since the artifacts aren't exactly used by any chained job, failures for uploading the artifact can be ignored. The artifacts are used mostly for the purpose for debugging and so if needed the user can trigger specific build again to get the artifact. Signed-off-by: HS <hs@apotell.com> Signed-off-by: Jim.Idle <jimi@idle.ws>
GH builds regularly fail with the 2 cpp targets using gcc.
The error occurs not during the build/test itself, but when uploading the artifacts to GH.
The error is 503 (service unavailable)
Could it be that the artifact is too large (log says 150994943 bytes) ?
See https://github.com/antlr/antlr4/actions/runs/4423203675/jobs/7755704404
@hs-apotell would you be able to look into this ?
The text was updated successfully, but these errors were encountered: