Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dm-operator/: support start a new dm cluster with dm-masters and dm-workers #3146

Merged
merged 40 commits into from
Aug 26, 2020

Conversation

lichunzhu
Copy link
Contributor

@lichunzhu lichunzhu commented Aug 21, 2020

What problem does this PR solve?

#2868

What is changed and how does it work?

Support deploying dm cluster with masters and workers.

Check List

Tests

  • Unit test
  • E2E test
  • Stability test
  • Manual test (add detailed scripts or steps below)

Deploy dm-cluster with manifests/dm/dm-cluster.yaml:

  1. kubectl get pods, dm-master and dm-worker pods start correctly
basic-dm-master-0                  1/1     Running   0          7m27s
basic-dm-master-1                  1/1     Running   0          7m27s
basic-dm-master-2                  1/1     Running   1          7m27s
basic-dm-worker-0                  1/1     Running   1          7m8s
basic-dm-worker-1                  1/1     Running   1          7m8s
basic-dm-worker-2                  1/1     Running   0          7m8s
  1. kubectl get dc, can return dm-cluster info correctly
NAME    READY   MASTER                     STORAGE   READY   DESIRE   WORKER                     STORAGE   READY   DESIRE   AGE
basic   True    pingcap/dm:v2.0.0-beta.2   1Gi       3       3        pingcap/dm:v2.0.0-beta.2   1Gi       3       3        7m52s
  1. kubectl port-forward svc/basic-dm-master, can forward dm-master service correctly.
basic-dm-master        ClusterIP   10.107.238.92    <none>        8261/TCP              8m8s
basic-dm-master-peer   ClusterIP   None             <none>        8291/TCP              8m8s
basic-dm-worker-peer   ClusterIP   None             <none>        8262/TCP              7m49s
  1. kubectl describe dc basic:
Spec:
  Discovery:
    Address:          http://basic-discovery.demo:10261
  Enable PV Reclaim:  false
  Image Pull Policy:  IfNotPresent
  Master:
    Base Image:  pingcap/dm
    Config:
    Replicas:         3
    Storage Size:     1Gi
  Pv Reclaim Policy:  Delete
  Tls Cluster:
  Version:  v2.0.0-beta.2
  Worker:
    Base Image:  pingcap/dm
    Config:
    Replicas:      3
    Storage Size:  1Gi
Status:
  Conditions:
    Last Transition Time:  2020-08-26T02:31:53Z
    Last Update Time:      2020-08-26T02:31:53Z
    Message:               DM cluster is fully up and running
    Reason:                Ready
    Status:                True
    Type:                  Ready
  Master:
    Image:  pingcap/dm:v2.0.0-beta.2
    Leader:
      Client URL:            http://0.0.0.0:8261
      Health:                true
      Id:                    14863793955117218053
      Last Transition Time:  2020-08-26T02:31:45Z
      Name:                  basic-dm-master-0
    Members:
      Basic - Dm - Master - 0:
        Client URL:            http://0.0.0.0:8261
        Health:                true
        Id:                    14863793955117218053
        Last Transition Time:  2020-08-26T02:31:45Z
        Name:                  basic-dm-master-0
      Basic - Dm - Master - 1:
        Client URL:            http://0.0.0.0:8261
        Health:                true
        Id:                    10916830957434093435
        Last Transition Time:  2020-08-26T02:31:45Z
        Name:                  basic-dm-master-1
      Basic - Dm - Master - 2:
        Client URL:            http://0.0.0.0:8261
        Health:                true
        Id:                    14896049602475078108
        Last Transition Time:  2020-08-26T02:31:52Z
        Name:                  basic-dm-master-2
    Phase:                     Normal
    Stateful Set:
      Collision Count:      0
      Current Replicas:     3
      Current Revision:     basic-dm-master-59697d9554
      Observed Generation:  1
      Ready Replicas:       3
      Replicas:             3
      Update Revision:      basic-dm-master-59697d9554
      Updated Replicas:     3
    Synced:                 true
  Worker:
    Image:  pingcap/dm:v2.0.0-beta.2
    Members:
      Basic - Dm - Worker - 0 . Basic - Dm - Worker - Peer : 8262:
        Addr:                  basic-dm-worker-0.basic-dm-worker-peer:8262
        Last Transition Time:  2020-08-26T02:31:52Z
        Name:                  basic-dm-worker-0.basic-dm-worker-peer:8262
        Stage:                 free
      Basic - Dm - Worker - 1 . Basic - Dm - Worker - Peer : 8262:
        Addr:                  basic-dm-worker-1.basic-dm-worker-peer:8262
        Last Transition Time:  2020-08-26T02:31:52Z
        Name:                  basic-dm-worker-1.basic-dm-worker-peer:8262
        Stage:                 free
      Basic - Dm - Worker - 2 . Basic - Dm - Worker - Peer : 8262:
        Addr:                  basic-dm-worker-2.basic-dm-worker-peer:8262
        Last Transition Time:  2020-08-26T02:31:51Z
        Name:                  basic-dm-worker-2.basic-dm-worker-peer:8262
        Stage:                 free
    Phase:                     Normal
    Stateful Set:
      Collision Count:      0
      Current Replicas:     3
      Current Revision:     basic-dm-worker-5bfd664759
      Observed Generation:  1
      Ready Replicas:       3
      Replicas:             3
      Update Revision:      basic-dm-worker-5bfd664759
      Updated Replicas:     3
    Synced:                 true

Code changes

  • Has Go code change

Does this PR introduce a user-facing change?:

NONE

@codecov-commenter
Copy link

codecov-commenter commented Aug 21, 2020

Codecov Report

Merging #3146 into master will decrease coverage by 2.97%.
The diff coverage is 1.78%.

@@            Coverage Diff             @@
##           master    #3146      +/-   ##
==========================================
- Coverage   43.66%   40.69%   -2.98%     
==========================================
  Files         159      167       +8     
  Lines       17101    18365    +1264     
==========================================
+ Hits         7467     7473       +6     
- Misses       8985    10244    +1259     
+ Partials      649      648       -1     
Flag Coverage Δ
#unittest 40.69% <1.78%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

manifests/dm/dm-cluster.yaml Outdated Show resolved Hide resolved
manifests/dm/dm-cluster.yaml Outdated Show resolved Hide resolved
manifests/dm/dm-cluster.yaml Outdated Show resolved Hide resolved
pkg/apis/pingcap/v1alpha1/dmcluster.go Outdated Show resolved Hide resolved
if version == nil {
version = &dc.Spec.Version
}
image = fmt.Sprintf("%s:%s", baseImage, *version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if *version=""?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "latest" in 161b862

pkg/apis/pingcap/v1alpha1/dmcluster.go Outdated Show resolved Hide resolved
pkg/apis/pingcap/v1alpha1/register.go Outdated Show resolved Hide resolved
pkg/apis/pingcap/v1alpha1/validation/validation.go Outdated Show resolved Hide resolved
pkg/apis/pingcap/v1alpha1/validation/validation.go Outdated Show resolved Hide resolved
pkg/apis/pingcap/v1alpha1/validation/validation.go Outdated Show resolved Hide resolved
// - upgrade the dm-master cluster
// - scale out/in the dm-master cluster
// - failover the dm-master cluster
if err := dcc.masterMemberManager.Sync(dc); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just log and append the error here and continue to sync the worker? WDYT @weekface @cofyc

pkg/controller/dmcluster/dm_cluster_controller.go Outdated Show resolved Hide resolved
pkg/label/label.go Show resolved Hide resolved
version: v2.0.0-beta.2
pvReclaimPolicy: Delete
discovery:
address: "http://basic-discovery.demo.svc:10261"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to config the discovery address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because dmcluster won't start another discovery service for itself. It will just reuse tidb cluster's discovery address.

pkg/manager/member/dm_worker_member_manager.go Outdated Show resolved Hide resolved
pkg/manager/member/dm_master_member_manager.go Outdated Show resolved Hide resolved
pkg/manager/member/pd_member_manager.go Outdated Show resolved Hide resolved
pkg/manager/member/template.go Outdated Show resolved Hide resolved
pkg/manager/member/template.go Outdated Show resolved Hide resolved
pkg/manager/member/utils.go Show resolved Hide resolved
pkg/util/crdutil.go Outdated Show resolved Hide resolved
examples/dm/README.md Outdated Show resolved Hide resolved
examples/dm/README.md Outdated Show resolved Hide resolved
pkg/apis/pingcap/v1alpha1/defaulting/dmcluster.go Outdated Show resolved Hide resolved
pkg/apis/pingcap/v1alpha1/dmcluster.go Show resolved Hide resolved
pkg/apis/pingcap/v1alpha1/dmcluster.go Outdated Show resolved Hide resolved
pkg/manager/member/dm_master_member_manager.go Outdated Show resolved Hide resolved
pkg/manager/member/dm_master_member_manager.go Outdated Show resolved Hide resolved
Comment on lines +90 to +92
if !dc.MasterIsAvailable() {
return controller.RequeueErrorf("DMCluster: %s/%s, waiting for dm-master cluster running", ns, dcName)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to remove this check if dm-worker can handle the case where dm-master service is unavailable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dm-master doesn't have more than replicas/2 members alive, it can't offer service. I think we can keep it?

pkg/manager/member/dm_worker_member_manager.go Outdated Show resolved Hide resolved
pkg/manager/member/dm_worker_member_manager.go Outdated Show resolved Hide resolved
@DanielZhangQD
Copy link
Contributor

When adding logic about the upgrade, scale, and failover, please consider the case in #2886.

@DanielZhangQD DanielZhangQD requested a review from lonng August 25, 2020 12:59
examples/dm/README.md Outdated Show resolved Hide resolved
pkg/controller/dmcluster/dm_cluster_controller.go Outdated Show resolved Hide resolved
Copy link
Contributor

@DanielZhangQD DanielZhangQD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

pkg/controller/dmcluster/dm_cluster_controller.go Outdated Show resolved Hide resolved
examples/dm/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@DanielZhangQD DanielZhangQD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@cofyc cofyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lichunzhu
Copy link
Contributor Author

/test pull-e2e-kind

@lichunzhu lichunzhu merged commit d6edf59 into pingcap:master Aug 26, 2020
@lichunzhu lichunzhu deleted the supportStartDMCluster branch August 26, 2020 07:06
@lichunzhu lichunzhu restored the supportStartDMCluster branch August 26, 2020 11:44
@lichunzhu lichunzhu deleted the supportStartDMCluster branch August 26, 2020 11:44
@lichunzhu lichunzhu restored the supportStartDMCluster branch August 26, 2020 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT2 status/PTAL PR needs to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants