-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
curl failures all over the place #128
Comments
These machines are geographically distributed, so there is no obvious common networking factor. Testing |
I'm gonna close this for now, since it seems to have been due to a transient networking issue. Let's reopen it if the problem recurs. |
It happens to opam-repo-ci quite a lot. I'm curious, since you also run the Since opam 2.1.5 this is respected and will then use the opam.ocaml.org host for requesting archives (instead of going to github or some other overloaded host)... Now I'm not sure anymore which opam versions your images have and why. |
Ah, I didn't realize this was chronic. That sounds like a good idea to me. I would reopen the issue but I don't have the permissions needed to do so. Are you able to @mtelvers ? |
well, I'm not sure about "chronic". What I see e.g. https://opam.ci.ocaml.org/github/ocaml/opam-repository/commit/c2ffca9c419985ae9191e0c272f33654d46a8eac is various failures, including 57 curl failures. |
I've rerun these failing jobs a few times (separated by several days, in hope that the network problem would get fixed), and they're still failing: https://opam.ci.ocaml.org/github/ocaml/opam-repository/commit/b69de340a94bb5bd475e25d32c407d590e87d37d I took one of the tests and repro-ed locally (cut-and-paste docker script, run it), and it worked fine: https://opam.ci.ocaml.org/github/ocaml/opam-repository/commit/b69de340a94bb5bd475e25d32c407d590e87d37d/variant/compilers,4.13,pa_ppx.0.15,tests So it does appear to be a problem with the CI infra. I suspect that one could craft a custom OPAM package that would elicit the bug, and if that would be useful, I can do that. But I'd only do that if it were useful, b/c ..... given the long waits for jobs to get scheduled, I suspect that it'd take a good number of weeks to narrow down to a minimal test that elicits the problem. |
I suspect that this problem is caused by a rate limit from the source websites.
The logs show we are recompiling OCaml. This is very curious to me.
The failed Looking back in the old logs using |
This issue occurs when the It has stopped and started several times as PRs are merged and base images have been rebuilt. |
I have created a PR ocaml/opam#6032 to work around ocaml/ocaml#13237 and while I wait for it to be merged, I have hacked up a commit on ocurrent/ocaml-dockerfile and used that to rebuild the base images using my own instance. I am pleased to report that this is having a beneficial result. Between midnight and 6am, opam-repo-ci rebuilt the compiler 32,000 times. In the following 6 hours, that figure has dropped to 147. |
I'm not sure if this is supposed to have solved the problem, but I figured I'd report back that the problem persists. I reran this test just now, and it still fails: https://opam.ci.ocaml.org/github/ocaml/opam-repository/commit/b69de340a94bb5bd475e25d32c407d590e87d37d/variant/compilers,4.13,pa_ppx.0.15,tests |
Thanks for your report. Unfortunately, I must have missed some of the base images and/or docker peek jobs, as a small percentage of jobs, such as the one you linked, are still rebuilding the compiler and generating the |
Thanks for replying so quickly! Can you update this thread when they're all finished (I clicked-thru to that link, but don't know how to interpret what's shown) and I'll rerun the CI jobs and report back on what happens ? |
@chetmurthy I've rebuilt the failed jobs, so we now only have a single |
Kudos to @mtelvers for his work on recovering from this, and on driving forward fixes for the root cause. Followups (including ways to catch this kind of thing earlier) are being tracked in other issues. Thanks for the report @mseri and @chetmurthy 🙏 |
The number of curl failures is going up again in large PRs. |
FWIW curl exit code 6 is "Could not resolve host. The given remote host could not be resolved." I don't know how the CI infrastructure is set up, but maybe it's worthwhile to look into setting up a caching resolver. Maybe a resolver out there is not too happy about a thundering herd asking for A & AAAA for github.com. |
They look like
See e.g. ocaml/opam-repository#26044
The text was updated successfully, but these errors were encountered: