Skip to content
This repository has been archived by the owner on Feb 4, 2021. It is now read-only.

OSX nightly debug failed cloning from GitHub on mini1 and mini2 #14

Closed
clalancette opened this issue Apr 25, 2017 · 10 comments
Closed

OSX nightly debug failed cloning from GitHub on mini1 and mini2 #14

clalancette opened this issue Apr 25, 2017 · 10 comments

Comments

@clalancette
Copy link

This job: http://ci.ros2.org/view/nightly/job/nightly_osx_debug/435/

failed to clone from github with an error:

07:39:33 error: RPC failed; curl 56 SSLRead() return error -9806

We've seen this from time-to-time, and it usually goes away. Still, we've seen it enough that I think we should probably look into it. I think @dhood had done some research before, but I thought I would re-report it here.

@clalancette
Copy link
Author

And this job failed with a different, but similar error:

http://ci.ros2.org/job/ci_osx/1991/

@mikaelarguedas
Copy link
Member

Note that this problem only shows up on mini2 and not on the other machines

@clalancette
Copy link
Author

Ah, thanks. I thought that might be the case, but I couldn't quite remember :).

@nuclearsandwich
Copy link
Member

I also noticed this is specific to the macos jobs, I didn't think to check if it's specific to one particular host. It may be a certificate bundle or openssl version issue that could be resolved by grabbing a fresh version of either of those from homebrew.

If the issue persists after bumping those packages we should find a way to troubleshoot the connection issues by setting GIT_CURL_VERBOSE=1 on that host.

@mikaelarguedas
Copy link
Member

The trend view is pretty useful to gather this kind of information:
http://ci.ros2.org/view/nightly/job/nightly_osx_debug/buildTimeTrend
Every failed job related to this issue was on mini2.

If the issue persists after bumping those packages we should find a way to troubleshoot the connection issues by setting GIT_CURL_VERBOSE=1 on that host.

👍

@dirk-thomas dirk-thomas changed the title OSX nightly debug failed cloning from github OSX nightly debug failed cloning from GitHub on mini2 May 2, 2017
@mikaelarguedas
Copy link
Member

We recently had an occurence of this on mini1 as well: http://ci.ros2.org/view/nightly/job/nightly_osx_release/526/

@mikaelarguedas mikaelarguedas changed the title OSX nightly debug failed cloning from GitHub on mini2 OSX nightly debug failed cloning from GitHub on mini1 and mini2 Jul 21, 2017
@nuclearsandwich
Copy link
Member

Searching for the error output suggests that this is a frequent issue with MacOS hosts running CI. It seems like it may just be a persistent error somewhere in the network stack. It occurs not only during Git clones but other curl-based https operations. Folks not using Git have reported that switching to wget, which has more (which is to say: some) retry behavior has given them more stability. As far as I know there is no way to swap git's http backend nor a way I know of to instruct the http backend to auto-retry.

A hacky fix would be to check the vcs exit code and try the vcs import again in 3-5 seconds.

@mikaelarguedas
Copy link
Member

Thanks for the info!

I think we can live with it until we switch to the new buildfarm (ros_buildfarm has retry behaviors for most network related operation (git, apt etc), ideally we could leverage the same type of retry behavior on non-docker-based builds)

@nuclearsandwich
Copy link
Member

I think we can live with it until we switch to the new buildfarm

We've brought this new buildfarm up a couple of times in offline conversation, and we'll almost certainly have more offline discussion about it, but I think it would be really helpful to have an issue or document somewhere about what we're missing with the current CI. What advantages it has over a vanilla ROS buildfarm and how we can grow either in the right direction. @mikaelarguedas I've talked with you about it the most and you seem to have the clearest vision at the moment. Could you make a pitch or prompt style issue for us to iterate on and discuss?

@dirk-thomas
Copy link
Member

I will close this since as of ros2/ci#103 we are retrying if the cloning fails.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants