Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vtctld] Add remaining reparent commands to VtctldServer #7536

Merged
merged 16 commits into from
Feb 25, 2021

Conversation

ajm188
Copy link
Contributor

@ajm188 ajm188 commented Feb 22, 2021

Description

This PR adds the remaining reparent-related RPCs to VtctldServer, namely:

  • EmergencyReparentShard
  • PlannedReparentShard
  • ReparentTablet
  • TabletExternallyReparented

It includes a cherry-picked version of #7523, which is required to get some ERS smoke tests working. After merging that PR, I'll rebase on master to pull out those changes.

Example usage

EmergencyReparentShard
❯ vtctldclient --server "localhost:15999" GetTablets
zone1-0000000100 commerce 0 master SFO-M-AMASON02:15100 SFO-M-AMASON02:17100 [] 2021-02-22T19:02:23Z
zone1-0000000101 commerce 0 replica SFO-M-AMASON02:15101 SFO-M-AMASON02:17101 [] <null>
zone1-0000000102 commerce 0 rdonly SFO-M-AMASON02:15102 SFO-M-AMASON02:17102 [] <null>
❯ vtctldclient --server "localhost:15999" EmergencyReparentShard commerce/0
I0222 19:10:24.577677 replication.go:181] getting replication position from zone1-0000000100
I0222 19:10:24.577679 replication.go:181] getting replication position from zone1-0000000102
I0222 19:10:24.577685 replication.go:181] getting replication position from zone1-0000000101
W0222 19:10:24.582681 replication.go:205] failed to get replication status from zone1-0000000100: rpc error: code = Unknown desc = TabletManager.StopReplicationAndGetStatus on zone1-0000000100 error: before status failed: no replication status: before status failed: no replication status
I0222 19:10:24.612392 emergency_reparenter.go:132] promoting tablet zone1-0000000102 to master
I0222 19:10:24.991490 emergency_reparenter.go:172] populating reparent journal on new master zone1-0000000102
I0222 19:10:24.991507 emergency_reparenter.go:178] setting new master on replica zone1-0000000100
I0222 19:10:24.991525 emergency_reparenter.go:178] setting new master on replica zone1-0000000101
❯ vtctldclient --server "localhost:15999" GetTablets
zone1-0000000100 commerce 0 replica SFO-M-AMASON02:15100 SFO-M-AMASON02:17100 [] <null>
zone1-0000000101 commerce 0 replica SFO-M-AMASON02:15101 SFO-M-AMASON02:17101 [] <null>
zone1-0000000102 commerce 0 master SFO-M-AMASON02:15102 SFO-M-AMASON02:17102 [] 2021-02-22T19:10:24Z
PlannedReparentShard
❯ vtctldclient --server "localhost:15999" GetTablets
zone1-0000000100 commerce 0 master SFO-M-AMASON02:15100 SFO-M-AMASON02:17100 [] 2021-02-22T18:37:43Z
zone1-0000000101 commerce 0 replica SFO-M-AMASON02:15101 SFO-M-AMASON02:17101 [] <null>
zone1-0000000102 commerce 0 rdonly SFO-M-AMASON02:15102 SFO-M-AMASON02:17102 [] <null>
❯ vtctldclient --server "localhost:15999" PlannedReparentShard commerce/0 --new-primary zone1-0000000101
I0222 18:38:00.244309 planned_reparenter.go:202] checking replication on primary-elect zone1-0000000101
I0222 18:38:00.299035 planned_reparenter.go:238] demoting current primary: zone1-0000000100
I0222 18:38:01.360709 planned_reparenter.go:607] populating reparent journal on new primary zone1-0000000101
I0222 18:38:01.360728 planned_reparenter.go:586] setting new primary on replica zone1-0000000100
I0222 18:38:01.360742 planned_reparenter.go:586] setting new primary on replica zone1-0000000102
❯ vtctldclient --server "localhost:15999" GetTablets
zone1-0000000100 commerce 0 replica SFO-M-AMASON02:15100 SFO-M-AMASON02:17100 [] <null>
zone1-0000000101 commerce 0 master SFO-M-AMASON02:15101 SFO-M-AMASON02:17101 [] 2021-02-22T18:38:01Z
zone1-0000000102 commerce 0 rdonly SFO-M-AMASON02:15102 SFO-M-AMASON02:17102 [] <null>

Related Issue(s)

Checklist

  • Should this PR be backported? No
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

@ajm188 ajm188 requested a review from sougou as a code owner February 22, 2021 20:23
@ajm188
Copy link
Contributor Author

ajm188 commented Feb 22, 2021

Looking into the datarace failure, which appears related to the PRS implementation in VtctldServer.

@ajm188 ajm188 force-pushed the am_vtctld_reparent_commands branch from 35cb340 to 2d7c891 Compare February 24, 2021 20:18
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
… ERS promotion

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
…ize VtctldServer with tmclient

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
@ajm188 ajm188 force-pushed the am_vtctld_reparent_commands branch from 2d7c891 to 24e393a Compare February 24, 2021 20:48
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I have some comments on the UX, the rest lgtm.

func init() {
EmergencyReparentShard.Flags().DurationVar(&emergencyReparentShardOptions.WaitReplicasTimeout, "wait-replicas-timeout", *topo.RemoteOperationTimeout, "Time to wait for replicas to catch up in reparenting.")
EmergencyReparentShard.Flags().StringVar(&emergencyReparentShardOptions.NewPrimaryAliasStr, "new-primary", "", "Alias of a tablet that should be the new primary. If not specified, the vtctld will select the best candidate to promote.")
EmergencyReparentShard.Flags().StringSliceVarP(&emergencyReparentShardOptions.IgnoreReplicaAliasStrList, "ignore-replicas", "i", nil, "Comma-separated, repeated list of replica tablet aliases to ignore during the emergency reparent.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EmergencyReparentShard.Flags().StringSliceVarP(&emergencyReparentShardOptions.IgnoreReplicaAliasStrList, "ignore-replicas", "i", nil, "Comma-separated, repeated list of replica tablet aliases to ignore during the emergency reparent.")
EmergencyReparentShard.Flags().StringSliceVarP(&emergencyReparentShardOptions.IgnoreReplicaAliasStrList, "ignore-replicas", "", nil, "Comma-separated, repeated list of replica tablet aliases to ignore during the emergency reparent.")

I assume this is a typo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "Comma-separated, repeated list"? Why not just "Comma-separated list"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's repeated in the sense that you can specify the flag multiple times, so both -i "${alias1},${alias2}" and -i ${alias1} -i ${alias2} will work (as well as -i "${alias1},${alias2}" -i ${alias3}). I'm not sure of a concise way to word that, and what I have here is definitely not straightforward, so I'm very open to suggestions!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it. Fine for right now, maybe we can revisit later.

func init() {
EmergencyReparentShard.Flags().DurationVar(&emergencyReparentShardOptions.WaitReplicasTimeout, "wait-replicas-timeout", *topo.RemoteOperationTimeout, "Time to wait for replicas to catch up in reparenting.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the old flags use underscore (_). Is it a cobra convention to use dash (-)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a GNU convention [1] (and therefore more common among most CLIs), so I've been doing it for all the vtctldclient options (except the global action_timeout, which I copied from vtctlclient and for some reason didn't change at the time). Happy to back this out in a separate change (since I'll have to update a bunch of flags) if Vitess prefers to enforce underscores everywhere.

[1]: Stack Overflow, which provides some additional reasoning as well as linking to the GNU style guide: https://stackoverflow.com/a/50537232

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.. let us follow the convention.

// primary. If NewPrimary was set in the request, then this will be the same
// alias. Otherwise, it will be the alias of the tablet found to be most
// up-to-date.
topodata.TabletAlias promoted_primary = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: Andrew Mason <amason@slack-corp.com>
@deepthi deepthi merged commit d34d951 into vitessio:master Feb 25, 2021
@askdba askdba added this to the v10.0 milestone Feb 26, 2021
@ajm188 ajm188 deleted the am_vtctld_reparent_commands branch March 4, 2021 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate reparent commands to VtctldServer
3 participants