Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pause rolling-upgrade process of tidb statefulset #470

Merged
merged 23 commits into from
May 23, 2019

Conversation

shuijing198799
Copy link
Contributor

@shuijing198799 shuijing198799 commented May 9, 2019

What problem does this PR solve?

Achieved the need to pause TiDB cluster upgrades

What is changed and how it works?

Add a webhoo to hijack the request of update statefulset from apiserver, add annotation to crd tidb-cluster and pause the rolling-update process. And also provider a tkcli tool to detect the update information of tidb statefulset.

use tkcli to fetch the tidb upgrade information:

$ ./tkctl upinfo
Name: e2e-cluster1
Namespace: e2e-cluster1
CreationTimestamp: 2019-05-06 20:28:28 +0800 CST
Statu: Normal
Name State
---- -----
e2e-cluster1-tidb-0 updated
e2e-cluster1-tidb-1 updated

use kubectl to change annotate to CRD tidb-cluster:

kubectl annotate tc e2e-cluster1 -n e2e-cluster1 tidb.pingcap.com/tidb-partition=0 --overwrite
tidbcluster.pingcap.com/e2e-cluster1 annotated

Tests

  • E2E test
  • Stability test
  • Manual test (add detailed scripts or steps below)

Code changes

  • Add yaml files
  • Has Go code change

@xiaojingchen @tennix @aylei @weekface PTAL

Does this PR introduce a user-facing change?:

NONE

manifests/webhook.yaml Outdated Show resolved Hide resolved
kind: Service
metadata:
name: admission-webhook-example-svc
namespace: pingcap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave the namespace field to allow user specify it from the command line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, seems namespace is required by cluster role binding. So you can keep it here, but please change it to tidb-admin the default namespace in our documentations.

name: admission-webhook-example-svc
namespace: pingcap
labels:
app: admission-webhook-example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why name it example?

body = data
} else {
responseAdmissionReview.Response = util.ARFail(err)
goto returnData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstract this as a function to avoid using goto.

command:
- /usr/local/bin/tidb-webhook
env:
- name: NAMESPACE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use downward API to get namespace dynamically.

env:
- name: NAMESPACE
value: pingcap
- name: SERVICENAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/SERVICENAME/SERVICE_NAME/

@@ -0,0 +1,84 @@
// Copyright 2018 PingCAP, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright 2018 PingCAP, Inc.
// Copyright 2019 PingCAP, Inc.


// Setup the server cert. For example, user apiservers and admission webhooks
// can use the cert to prove their identify to the kube-apiserver
func SetupServerCert(namespaceName, serviceName string) (*CertContext, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create certificates temporarily? Does this need to be persistent?

KubeCli: kubecli,
Cli: cli,
Context: context,
ConfigName: "validating-webhook-configuration",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be configurable.


func strPtr(s string) *string { return &s }

func (ws *WebhookServer) RegisterWebhook(namespace string, svcName string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be applied to the cluster with a yaml file?

@xiaojingchen
Copy link
Contributor

xiaojingchen commented May 9, 2019

@shuijing198799 webhook is a new component of tidb-operator, so need support helm to manage it.
you can add it to charts/tidb-operator just like tidb-controller and tidb-scheduler

return util.ARFail(err)
}

cli, _, err := util.GetNewClient()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a new client every time a stateful set is updated?

// See the License for the specific language governing permissions and
// limitations under the License.

package main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that k8s does not provide such a admission controller。

Copy link
Contributor

@weekface weekface May 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webhook is not a nice component name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pkg/tkctl/cmd/upinfo/upinfo.go Show resolved Hide resolved

func main() {

cli, kubeCli, err := util.GetNewClient()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfg, err := rest.InClusterConfig()
if err != nil {
glog.Fatalf("failed to get config: %v", err)
}
cli, err := versioned.NewForConfig(cfg)
if err != nil {
glog.Fatalf("failed to create Clientset: %v", err)
}
kubeCli, err := kubernetes.NewForConfig(cfg)
if err != nil {
glog.Fatalf("failed to get kubernetes Clientset: %v", err)
}

subjects:
- kind: ServiceAccount
namespace: pingcap
name: admission-webhook-example-sa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example ?

)

func AdmitStatefulSets(ar v1beta1.AdmissionReview) *v1beta1.AdmissionResponse {
glog.Infof("admit statefulsets")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log more pieces of information, for example, the namespace and name of the statefulset

return util.ARFail(err)
}

if set.Labels[label.ComponentLabelKey] == "tidb" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if set.Labels[label.ComponentLabelKey] == "tidb" {
if set.Labels[label.ComponentLabelKey] == label.TiDBLabelVal {

}

ns := os.Getenv("NAMESPACE")
if ns == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use strings.TrimSpace(ns)== ""

}

ns := os.Getenv("NAMESPACE")
if ns == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use strings.TrimSpace(ns)== ""

You can omit --tidbcluster=<name> option by running 'tkc use <clusterName>',
`
upinfoExample = `
# get current tidb cluster info (set by tkc use)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# get current tidb cluster info (set by tkc use)
# get current tidb cluster info (set by tkc user)

@cofyc
Copy link
Contributor

cofyc commented May 10, 2019

How about moving the design proposal in google docs to the repo?

@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

@weekface
Copy link
Contributor

weekface commented May 21, 2019

ref:
kubernetes/kubernetes#72944
https://github.com/kubedb/project/issues/309#issuecomment-428320597

Allow Mutating|ValidatingWebhookConfiguration to use secret for CABundle

pkg/webhook/route/route.go Show resolved Hide resolved

if upgradePaused() {

time.Sleep(5 * time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upgrade process take more than 5 minutes, so you must sleep more than it, for exmaple 10 minutes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgradePause make sure that the last TiDB pod is upgraded and healthy, So I think 5 minute is enough.

@weekface
Copy link
Contributor

add a release note like: #477

- -tlsKeyFile=/etc/webhook/certs/key.pem
- -alsologtostderr
- -v={{ .Values.admissionController.logLevel }}
- 2>&1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of this redirection?

apiVersion: apps/v1
kind: Deployment
metadata:
name: tidb-admission-controller-deployment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to argue about naming style, bug add -deployment suffix to a deployment object is wired, so does the tidb-admission-controller-pod and admission-controller-svc and admission-controller-cfg

return util.ARFail(err)
}

if versionCli == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put the initialization of versionCli to init() in favor of fail-fast

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// toAdmissionResponse is a helper function to create an AdmissionResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// toAdmissionResponse is a helper function to create an AdmissionResponse
// ARFail is a helper function to create an AdmissionResponse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pkg/webhook/util/util.go Outdated Show resolved Hide resolved
tests/actions.go Outdated
for _, container := range pod.Spec.Containers {
if container.Name == label.TiDBLabelVal ||
container.Name == label.TiKVLabelVal ||
container.Name == label.PDLabelVal {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
container.Name == label.PDLabelVal {
if container.Name == v1alpha1.PDMemberType.String() ||

Though the literal value is same coincidentally, we should use v1alpha1.XXMemberType here (So do TiDB and TiKV).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tests/actions.go Outdated Show resolved Hide resolved
pkg/tkctl/cmd/upinfo/upinfo.go Show resolved Hide resolved
@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

1 similar comment
@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

1 similar comment
@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

2 similar comments
@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

@shuijing198799
Copy link
Contributor Author

/run-e2e-tests

@weekface
Copy link
Contributor

/run-e2e-tests

Copy link
Contributor

@weekface weekface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@xiaojingchen xiaojingchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@weekface weekface merged commit cb9badc into pingcap:master May 23, 2019
yahonda pushed a commit that referenced this pull request Dec 27, 2021
* en: split troubleshoot to several docs

* Apply suggestions from code review

Co-authored-by: DanielZhangQD <36026334+DanielZhangQD@users.noreply.github.com>

* add a missing step

* fix lint

* Apply suggestions from code review

Co-authored-by: Lilian Lee <lilin@pingcap.com>

* add aliases for zh troubleshoot

* Update zh/tips.md

* align with the Chinese doc

Co-authored-by: DanielZhangQD <36026334+DanielZhangQD@users.noreply.github.com>
Co-authored-by: lilin90 <lilin@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants