Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't failover if TiDB pod is not scheduled and perform recovery no matter what state failover pods are in #2263

Merged
merged 10 commits into from
Apr 30, 2020

Conversation

cofyc
Copy link
Contributor

@cofyc cofyc commented Apr 22, 2020

What problem does this PR solve?

fixes #2155

What is changed and how does it work?

Check List

Tests

  • Unit test
  • E2E test
  • Stability test
  • Manual test (add detailed scripts or steps below)
  • No code

https://internal.pingcap.net/idc-jenkins/blue/organizations/jenkins/tidb-operator-e2e-gke-ci-master-stability/detail/tidb-operator-e2e-gke-ci-master-stability/54/pipeline

Code changes

  • Has Go code change
  • Has CI related scripts change
  • Has Terraform scripts change

Side effects

  • Breaking backward compatibility

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Does this PR introduce a user-facing change?:

Skip auto-failover when pods are not scheduled and perform recovery operation no matter what state failover pods are in

@cofyc cofyc force-pushed the fix2155 branch 3 times, most recently from 527043e to d687a08 Compare April 22, 2020 10:56
Yisaer
Yisaer previously approved these changes Apr 23, 2020
Copy link
Contributor

@Yisaer Yisaer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to also support this feature for tikv and pd?

@cofyc cofyc changed the title skip auto failover if tidb pod is not scheduled WIP: skip auto failover if tidb pod is not scheduled Apr 24, 2020
@cofyc
Copy link
Contributor Author

cofyc commented Apr 26, 2020

Do we need to also support this feature for tikv and pd?

at least, PD is required, see #2161. (not the same, if PD pod is pending, the member name will not appear in status.pd.members, so no replacement will create for it)
one failed replica may cause more than one replacements to be created in failover.

I'm going to fix #2161 first.

@cofyc cofyc changed the title WIP: skip auto failover if tidb pod is not scheduled skip auto failover if TiDB pod is not scheduled Apr 28, 2020
@cofyc cofyc changed the title skip auto failover if TiDB pod is not scheduled skip auto failover if TiDB pod is not scheduled and perform recovery no matter what state failover pods are in Apr 29, 2020
@cofyc cofyc changed the title skip auto failover if TiDB pod is not scheduled and perform recovery no matter what state failover pods are in don't failover if TiDB pod is not scheduled and perform recovery no matter what state failover pods are in Apr 29, 2020
ginkgo.AfterEach(func() {
ginkgo.By("Uninstall tidb-operator")
oa.CleanOperatorOrDie(ocfg)
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easier to debug in development

@@ -1398,6 +1405,10 @@ func (oa *operatorActions) pdMembersReadyFn(tc *v1alpha1.TidbCluster) (bool, err
return false, nil
}

if pdSet.Status.CurrentRevision != pdSet.Status.UpdateRevision {
return false, nil
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure all pods of StatefulSet are up to date

Copy link
Contributor

@DanielZhangQD DanielZhangQD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cofyc cofyc mentioned this pull request Apr 29, 2020
4 tasks
@DanielZhangQD
Copy link
Contributor

/merge

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

Your auto merge job has been accepted, waiting for:

  • 2341

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

@cofyc merge failed.

@DanielZhangQD
Copy link
Contributor

/merge

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

cherry pick to release-1.1 in PR #2358

cofyc added a commit that referenced this pull request Apr 30, 2020
…atter what state failover pods are in (#2263) (#2358)

* skip auto failover if pod is not scheduled

* fix tidb recover

* share failover period in tests

Co-authored-by: Yecheng Fu <fuyecheng@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failover will be triggered after the TiDB pod is in the Pending state for five minutes
5 participants