Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vttablet refactor phase 2 #6461

Merged
merged 56 commits into from
Jul 26, 2020
Merged

vttablet refactor phase 2 #6461

merged 56 commits into from
Jul 26, 2020

Conversation

sougou
Copy link
Contributor

@sougou sougou commented Jul 21, 2020

This is a the second part of the vttablet refactor. We essentially rewrite entire portions of the TabletManager, So, it may be easier to review the new functionality directly instead of looking at the diffs.

At a high level, this refactor achieves the following goals:

  • Healthcheck reporting was co-managed by TabletManager and TabletServer. This is now fully managed by TabletServer.
  • TabletServer's stateManager now has a consolidated set of variables that contribute to the serving state. These variables are now exposed through /debug/status.
  • All sub-components have to do their own retry if they fail to open.

Details

healthStreamer

This is a new component in TabletServer. It owns streaming of health info to subscribers like VTGate. It receives updates from stateManager whenever there is a material change in serving state.

replTracker

This component encapsulates the heartbeat reader, writer, and a new poller. It uses only one of these methods to report on replication health. If it's the poller, it gets invoked everytime a status check is performed. This is triggered from the stateManager.

stateManager

This component was introduced in a previous PR. It has now been enhanced to fetch replication info from replTracker, and feed it to healthStreamer. If the replication is unhealthy, stateManager also goes into a non-serving state. These substates are exported in /debug/status, and also there is a json export at the new path /debug/status_details.

replManager

This component watches over replication and fixes it as needed. This job was previously done by the replicationReporter.

tmState

This was functionality from state_change.go. There are now clear entry points for different state changes, like RefreshFromTopo, ChangeTabletType, etc.

@sougou sougou marked this pull request as ready for review July 22, 2020 05:39
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass

sougou added 22 commits July 22, 2020 21:25
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Introduce a new Seconds type with explicit conversions to and
from time.Duration.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
sougou added 9 commits July 22, 2020 21:25
shutdown mysqls in parallel where possible.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
We saw another shard_sync panic in the tests due to a mismatch
between tablet type and master term start time because they
get fetched at different times instead of together.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
assert.Equal(t, want, shr)
}

func testStream(t *testing.T, hs *healthStreamer) (<-chan *querypb.StreamHealthResponse, context.CancelFunc) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t is not used in this method

return err
}
if !si.HasMaster() {
return fmt.Errorf("no master tablet for shard %v/%v", tablet.Keyspace, tablet.Shard)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this file is using fmt.Error instead of vterrors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only moved this code from somewhere else. All these older functions are still using fmt.Errorf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not see any test file for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are end2end tests for this. We should ideally have unit tests also, but it's legacy code.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. A few minor nits.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
})
}

func (hs *healthStreamer) ApppendDetails(details []*kv) []*kv {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any test for this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no framework to test the /debug/status pages right now. But I tested it manually though.

Comment on lines +150 to +160
if rm.markerFile == "" {
return
}
if stopped {
if file, err := os.Create(rm.markerFile); err == nil {
file.Close()
}
} else {
os.Remove(rm.markerFile)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When markerFile be empty? If it is only for test, can we change the test to not have empty markerFile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markerFile can be empty if the mysql is unmanaged. There are end2end tests for testing the overall behavior. I've written some unit tests to cover the rest of the corner cases that are not covered by the e2e tests.

return err
}
if !si.HasMaster() {
return fmt.Errorf("no master tablet for shard %v/%v", tablet.Keyspace, tablet.Shard)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not see any test file for this.

Comment on lines +224 to +238
if ts.tm.UpdateStream != nil {
if topo.IsRunningUpdateStream(ts.tablet.Type) {
ts.tm.UpdateStream.Enable()
} else {
ts.tm.UpdateStream.Disable()
}
}

if ts.tm.VREngine != nil {
if ts.tablet.Type == topodatapb.TabletType_MASTER {
ts.tm.VREngine.Open(ts.tm.BatchCtx)
} else {
ts.tm.VREngine.Close()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpdateStream and VREngine are not set in the test. See if can be added in test tm initialize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test framework doesn't support these. But there are many end2end tests for this.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
@sougou sougou merged commit f4269bc into vitessio:master Jul 26, 2020
@shlomi-noach shlomi-noach deleted the ss-ts2-health branch July 26, 2020 08:07
@deepthi
Copy link
Member

deepthi commented Jul 27, 2020

Fixes #6131 #6345

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants