Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: make travis retry the build #7043

Merged
merged 1 commit into from
Nov 7, 2018
Merged

Conversation

bcardiff
Copy link
Member

@bcardiff bcardiff commented Nov 7, 2018

Checking if travis is more stable this way

@bcardiff bcardiff added this to the 0.27.1 milestone Nov 7, 2018
@straight-shoota
Copy link
Member

This might make it more stable but also potentially hide less-frequently occuring errors.

@bcardiff bcardiff merged commit efe0f11 into crystal-lang:master Nov 7, 2018
@bcardiff
Copy link
Member Author

bcardiff commented Nov 7, 2018

They were mostly false positives. If it's a stable error the build will fail. But temporal memory allocations had affected build process a lot.

Let's see how it plays for a while.

@straight-shoota
Copy link
Member

What about such failures: https://circleci.com/gh/crystal-lang/crystal/14215

@bcardiff
Copy link
Member Author

bcardiff commented Nov 8, 2018

Eventually they will keep appearing, less times but still. But when a PR fails at seems unrelated the author pushed a commit to trigger again the build o some of the team restart the build. We are trying to improve that.

It there is a better process to collect sporadic failures without affecting the contribution workflows I'm probably happy to go in that direction.

@RX14
Copy link
Contributor

RX14 commented Nov 8, 2018

@straight-shoota they'll still appear on circle as often

@bcardiff bcardiff deleted the travis-retry branch November 9, 2018 03:30
@Sija
Copy link
Contributor

Sija commented Nov 22, 2018

@bcardiff Retrying failed jobs does it unconditionally, whether it's a legitimate fail or a bug/OOM error - see https://travis-ci.org/crystal-lang/crystal/jobs/458483950 for instance.

It should be either scoped (lets say to master branch) or reverted altogether.

@bcardiff
Copy link
Member Author

I don't think there is a way to retry only on unexpected errors but accept a failure on the spec run as final.

At the end of the day consistency among the runs is what matters, at the cost of spending some travis resources, but not delaying CI until a manual trigger is done.

@Sija
Copy link
Contributor

Sija commented Nov 22, 2018

@bcardiff So why not scope it to master branch only? That would be IMO some middle-way solution.

@bcardiff
Copy link
Member Author

bcardiff commented Nov 22, 2018

But the false broken builds occurs in PRs also. And actually those are where it bothers the most IMO. Authors needs to push to rebuild, or wait for someone to restart the process. It's better to that the retry work unattended if the failure is not real.

  • If the CI success we are almost as confident as before.
    • On success we reduce the chance to notice a failure due to uncontrolled state.
  • If the CI fail we are more confident is not a sporadic memory error.

I don't see why that should differ on branch or why doing it on master is a middle-way.

@Sija
Copy link
Contributor

Sija commented Nov 22, 2018

I don't see why that should differ on branch or why doing it on master is a middle-way.

Because it ain't burning Travis' resources on failures, which within PRs are quite a common occurrence.

@bcardiff
Copy link
Member Author

And actually those are where it bothers the most IMO.

I think it's more valuable human time/attention than machine time.

@Sija
Copy link
Contributor

Sija commented Nov 22, 2018

I think it's more valuable human time/attention than machine time.

Good argument, I agree.

I don't think there is a way to retry only on unexpected errors but accept a failure on the spec run as final.

Perhaps exiting from specs with some chosen return code other than 1:

  • 0: success
  • 1: crystal failure
  • 3: specs failure

@straight-shoota
Copy link
Member

Specs failure could still be a false positive.

@Sija
Copy link
Contributor

Sija commented Nov 22, 2018

@straight-shoota how come?

@straight-shoota
Copy link
Member

Because a spec could fail because it is permanently broken or because of random chance circumstances.

@straight-shoota
Copy link
Member

Even a build failure could either be permanently broken or just caused by a strange coincidence.

@RX14
Copy link
Contributor

RX14 commented Nov 23, 2018

why don't we just do as I suggested and remove travis?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants