-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vtorc
: improve handling of partial cell topo results
#17718
vtorc
: improve handling of partial cell topo results
#17718
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
vtorc
: improve handling of partial topo resultsvtorc
: improve handling of partial cell topo results
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17718 +/- ##
==========================================
+ Coverage 67.79% 67.95% +0.16%
==========================================
Files 1587 1586 -1
Lines 255829 255209 -620
==========================================
+ Hits 173427 173433 +6
+ Misses 82402 81776 -626 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
bb5e29c
to
b3936d2
Compare
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest looks good to me!
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Update: tested this PR by backporting it to our v19 release Working as expected ✅. We now get an error log line for the individual cell failure(s), also an error indicating we got a partial result from all cells |
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
* Move to native sqlite3 queries (vitessio#17124) Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com> * Improve efficiency of `vtorc` topo calls (vitessio#17071) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Co-authored-by: Matt Lord <mattalord@gmail.com> * Ensure all topo read calls consider `--topo_read_concurrency` (vitessio#17276) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Avoid flaky topo concurrency test (vitessio#17407) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: fetch all tablets from cells once + filter during refresh (vitessio#17388) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Support KeyRange in `--clusters_to_watch` flag (vitessio#17604) Signed-off-by: Manan Gupta <manan@planetscale.com> * `vtorc`: improve handling of partial cell topo results (vitessio#17718) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Add stats for shards watched by VTOrc Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * add more tests Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * cleanup Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix ineffassign Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix test for v21 Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Use prefix in all vtorc check and recover logs (vitessio#17526) Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com> --------- Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Signed-off-by: Manan Gupta <manan@planetscale.com> Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com> Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: Matt Lord <mattalord@gmail.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> Co-authored-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
Description
This PR improves the safety of
getAllTablets
ingo/vt/vtorc/logic/tablet_discovery.go
, which is used to get all tablet records from all cellsThe new logic returns the tablets from
getAllTablets(...)
in a per-cell map (asmap[string][]*topo.TabletInfo
) to ensure that only the cells that responded are operated on. A list of failed cells is also returned as[]string
This avoids tablets from being forgotten when one more cells fail, because before this PR the SQL query to get aliases to forget in
refreshTablets(...)
does not consider cells that never responded. This bug was introduced by #17388Related Issue(s)
Closes #17719
Checklist
Deployment Notes