-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pause rolling-upgrade process of tidb statefulset #470
Conversation
manifests/webhook.yaml
Outdated
kind: Service | ||
metadata: | ||
name: admission-webhook-example-svc | ||
namespace: pingcap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave the namespace field to allow user specify it from the command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, seems namespace
is required by cluster role binding. So you can keep it here, but please change it to tidb-admin
the default namespace in our documentations.
manifests/webhook.yaml
Outdated
name: admission-webhook-example-svc | ||
namespace: pingcap | ||
labels: | ||
app: admission-webhook-example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why name it example
?
pkg/webhook/route/route.go
Outdated
body = data | ||
} else { | ||
responseAdmissionReview.Response = util.ARFail(err) | ||
goto returnData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstract this as a function to avoid using goto
.
manifests/webhook.yaml
Outdated
command: | ||
- /usr/local/bin/tidb-webhook | ||
env: | ||
- name: NAMESPACE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use downward API to get namespace dynamically.
manifests/webhook.yaml
Outdated
env: | ||
- name: NAMESPACE | ||
value: pingcap | ||
- name: SERVICENAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/SERVICENAME/SERVICE_NAME/
cmd/webhook/main.go
Outdated
@@ -0,0 +1,84 @@ | |||
// Copyright 2018 PingCAP, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Copyright 2018 PingCAP, Inc. | |
// Copyright 2019 PingCAP, Inc. |
pkg/webhook/util/certs.go
Outdated
|
||
// Setup the server cert. For example, user apiservers and admission webhooks | ||
// can use the cert to prove their identify to the kube-apiserver | ||
func SetupServerCert(namespaceName, serviceName string) (*CertContext, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why create certificates temporarily? Does this need to be persistent?
pkg/webhook/webhook.go
Outdated
KubeCli: kubecli, | ||
Cli: cli, | ||
Context: context, | ||
ConfigName: "validating-webhook-configuration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be configurable.
pkg/webhook/webhook.go
Outdated
|
||
func strPtr(s string) *string { return &s } | ||
|
||
func (ws *WebhookServer) RegisterWebhook(namespace string, svcName string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be applied to the cluster with a yaml file?
@shuijing198799 webhook is a new component of tidb-operator, so need support helm to manage it. |
return util.ARFail(err) | ||
} | ||
|
||
cli, _, err := util.GetNewClient() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a new client every time a stateful set is updated?
cmd/webhook/main.go
Outdated
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may use admission-controller
instead of webhook
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that k8s does not provide such a admission controller。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
webhook
is not a nice component name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cmd/webhook/main.go
Outdated
|
||
func main() { | ||
|
||
cli, kubeCli, err := util.GetNewClient() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tidb-operator/cmd/controller-manager/main.go
Lines 89 to 101 in 617546b
cfg, err := rest.InClusterConfig() | |
if err != nil { | |
glog.Fatalf("failed to get config: %v", err) | |
} | |
cli, err := versioned.NewForConfig(cfg) | |
if err != nil { | |
glog.Fatalf("failed to create Clientset: %v", err) | |
} | |
kubeCli, err := kubernetes.NewForConfig(cfg) | |
if err != nil { | |
glog.Fatalf("failed to get kubernetes Clientset: %v", err) | |
} |
manifests/webhook-rbac.yaml
Outdated
subjects: | ||
- kind: ServiceAccount | ||
namespace: pingcap | ||
name: admission-webhook-example-sa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example
?
) | ||
|
||
func AdmitStatefulSets(ar v1beta1.AdmissionReview) *v1beta1.AdmissionResponse { | ||
glog.Infof("admit statefulsets") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log more pieces of information, for example, the namespace and name of the statefulset
return util.ARFail(err) | ||
} | ||
|
||
if set.Labels[label.ComponentLabelKey] == "tidb" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if set.Labels[label.ComponentLabelKey] == "tidb" { | |
if set.Labels[label.ComponentLabelKey] == label.TiDBLabelVal { |
cmd/webhook/main.go
Outdated
} | ||
|
||
ns := os.Getenv("NAMESPACE") | ||
if ns == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use strings.TrimSpace(ns)== ""
cmd/webhook/main.go
Outdated
} | ||
|
||
ns := os.Getenv("NAMESPACE") | ||
if ns == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use strings.TrimSpace(ns)== ""
pkg/tkctl/cmd/upinfo/upinfo.go
Outdated
You can omit --tidbcluster=<name> option by running 'tkc use <clusterName>', | ||
` | ||
upinfoExample = ` | ||
# get current tidb cluster info (set by tkc use) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# get current tidb cluster info (set by tkc use) | |
# get current tidb cluster info (set by tkc user) |
How about moving the design proposal in google docs to the repo? |
/run-e2e-tests |
/run-e2e-tests |
/run-e2e-tests |
ref: Allow Mutating|ValidatingWebhookConfiguration to use secret for CABundle |
|
||
if upgradePaused() { | ||
|
||
time.Sleep(5 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upgrade process take more than 5 minutes, so you must sleep more than it, for exmaple 10 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upgradePause make sure that the last TiDB pod is upgraded and healthy, So I think 5 minute is enough.
add a release note like: #477 |
- -tlsKeyFile=/etc/webhook/certs/key.pem | ||
- -alsologtostderr | ||
- -v={{ .Values.admissionController.logLevel }} | ||
- 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of this redirection?
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: tidb-admission-controller-deployment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to argue about naming style, bug add -deployment
suffix to a deployment object is wired, so does the tidb-admission-controller-pod
and admission-controller-svc
and admission-controller-cfg
return util.ARFail(err) | ||
} | ||
|
||
if versionCli == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we put the initialization of versionCli
to init()
in favor of fail-fast
pkg/webhook/util/util.go
Outdated
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
) | ||
|
||
// toAdmissionResponse is a helper function to create an AdmissionResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// toAdmissionResponse is a helper function to create an AdmissionResponse | |
// ARFail is a helper function to create an AdmissionResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tests/actions.go
Outdated
for _, container := range pod.Spec.Containers { | ||
if container.Name == label.TiDBLabelVal || | ||
container.Name == label.TiKVLabelVal || | ||
container.Name == label.PDLabelVal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
container.Name == label.PDLabelVal { | |
if container.Name == v1alpha1.PDMemberType.String() || |
Though the literal value is same coincidentally, we should use v1alpha1.XXMemberType
here (So do TiDB
and TiKV
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/run-e2e-tests |
/run-e2e-tests |
/run-e2e-tests |
/run-e2e-tests |
1 similar comment
/run-e2e-tests |
/run-e2e-tests |
1 similar comment
/run-e2e-tests |
/run-e2e-tests |
2 similar comments
/run-e2e-tests |
/run-e2e-tests |
/run-e2e-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* en: split troubleshoot to several docs * Apply suggestions from code review Co-authored-by: DanielZhangQD <36026334+DanielZhangQD@users.noreply.github.com> * add a missing step * fix lint * Apply suggestions from code review Co-authored-by: Lilian Lee <lilin@pingcap.com> * add aliases for zh troubleshoot * Update zh/tips.md * align with the Chinese doc Co-authored-by: DanielZhangQD <36026334+DanielZhangQD@users.noreply.github.com> Co-authored-by: lilin90 <lilin@pingcap.com>
What problem does this PR solve?
Achieved the need to pause TiDB cluster upgrades
What is changed and how it works?
Add a webhoo to hijack the request of update statefulset from apiserver, add annotation to crd tidb-cluster and pause the rolling-update process. And also provider a tkcli tool to detect the update information of tidb statefulset.
use tkcli to fetch the tidb upgrade information:
$ ./tkctl upinfo
Name: e2e-cluster1
Namespace: e2e-cluster1
CreationTimestamp: 2019-05-06 20:28:28 +0800 CST
Statu: Normal
Name State
---- -----
e2e-cluster1-tidb-0 updated
e2e-cluster1-tidb-1 updated
use kubectl to change annotate to CRD tidb-cluster:
kubectl annotate tc e2e-cluster1 -n e2e-cluster1 tidb.pingcap.com/tidb-partition=0 --overwrite
tidbcluster.pingcap.com/e2e-cluster1 annotated
Tests
Code changes
@xiaojingchen @tennix @aylei @weekface PTAL
Does this PR introduce a user-facing change?: