Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry logic to recreate deployment strategy #846

Merged
merged 1 commit into from
Feb 3, 2015

Conversation

ironcladlou
Copy link
Contributor

Support retries in the Recreate deployment strategy. This is a simple
fix for update race conditions as described in #836.

@ironcladlou
Copy link
Contributor Author

@pmorie @smarterclayton @pweil- PTAL. I did some more refactoring and test expansion while I was at it.

client: &realReplicationController{client},
codec: codec,
retryTimeout: 10 * time.Second,
retryPeriod: 1 * time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you get a conflict you should read and retry immediately. You don't need timeouts and waits for that type of error (for the others, retry is appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now made the retry logic continue immediately when IsConflict. Timeout is still effective, though. Seems like if there's enough thrashing on an RC that you can't get an update in with tight-loop attempts over 10 seconds, something somewhere's probably very wrong and we should fail anyway. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that for now. We just don't need a retry period on a conflict.

----- Original Message -----

+type RecreateDeploymentStrategy struct {

  • // client is used to interact with ReplicatonControllers.
  • client replicationControllerClient
  • // codec is used to decode DeploymentConfigs contained in deployments.
  • codec runtime.Codec
  • retryTimeout time.Duration
  • retryPeriod time.Duration
    +}

+func NewRecreateDeploymentStrategy(client kclient.Interface, codec
runtime.Codec) *RecreateDeploymentStrategy {

  • return &RecreateDeploymentStrategy{
  •   client:       &realReplicationController{client},
    
  •   codec:        codec,
    
  •   retryTimeout: 10 \* time.Second,
    
  •   retryPeriod:  1 \* time.Second,
    

I made the retry logic continue immediately when IsConflict. Timeout is
still effective, though. Seems like if there's enough thrashing on an RC
that you can't get an update in with tight-loop attempts over 10 seconds,
something somewhere's probably very wrong and we should fail anyway. What do
you think?


Reply to this email directly or view it on GitHub:
https://github.com/openshift/origin/pull/846/files#r24012945

Support retries in the Recreate deployment strategy. This is a simple
fix for update race conditions as described in openshift#836.
@ironcladlou
Copy link
Contributor Author

[test]

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_openshift3/939/)

@pmorie
Copy link
Contributor

pmorie commented Feb 3, 2015

LGTM 👍

@pweil-
Copy link

pweil- commented Feb 3, 2015

LGTM

@ironcladlou
Copy link
Contributor Author

How about a tag? 🚤

@smarterclayton smarterclayton added the kind/bug Categorizes issue or PR as related to a bug. label Feb 3, 2015
@smarterclayton smarterclayton added this to the 0.3.0 (beta1) milestone Feb 3, 2015
@smarterclayton
Copy link
Contributor

[merge]

@smarterclayton smarterclayton modified the milestone: 0.3.0 (beta1) Feb 3, 2015
@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_openshift3/788/) (Image: devenv-fedora_672)

@openshift-bot
Copy link
Contributor

Evaluated for origin up to ca44bf4

openshift-bot pushed a commit that referenced this pull request Feb 3, 2015
@openshift-bot openshift-bot merged commit cd35383 into openshift:master Feb 3, 2015
@ironcladlou ironcladlou deleted the deployer-race-fix branch February 6, 2015 19:13
@smarterclayton smarterclayton modified the milestone: 0.5.0 (beta3) Apr 23, 2015
sjenning pushed a commit to sjenning/origin that referenced this pull request Jan 5, 2018
[Backport][3.5] Bug 1481550 - Fix network diagnostics timeouts
jpeeler pushed a commit to jpeeler/origin that referenced this pull request Feb 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants