Bug Report: Race condition can cause queries to fail when vtgate starts up #16656

GuptaManan100 · 2024-08-27T09:05:50Z

Overview of the Issue

It was noticed there was a race condition that prevented some queries from being buffered when vtgate started up.
The order of operations is such -

HealthCheck is created and starts receiving updates from the tablets.
Keysapce Event Watcher was initialized and it subscribes to the updates from the healthcheck, but these updates are processed asynchronously.
vtgate waits for healthcheck to receive updates from all the primary tablets to be serving.
Then it starts accepting traffic, when one primary becomes non-serving (because of PRS)
Keyspace event watcher hasn't finished processing the updates from the healthcheck, so from PrimaryIsNotServing it returns nil, false because it doesn't have the shard information stored.
This causes the query to be dropped and user getting an error message stating no healthy tablet available

The ideal behaviour is the queries to be buffered. The problem happens because of the race between the keyspace watcher processing the first healthcheck updates it received and vtgate starting to accept queries.

Reproduction Steps

Run a cluster with vitess-operator which has 2 vtgates and at least 3 tablets. Trigger a rolling update of the entire cluster (easiest way to do this is to change the vitess image version), while running continuous query traffic. Repeat until error is seen.

Binary Version

main

Operating System and Environment details

Log Fragments

No response

The text was updated successfully, but these errors were encountered:

GuptaManan100 added Type: Bug Component: Query Serving labels Aug 27, 2024

GuptaManan100 mentioned this issue Aug 27, 2024

Fix race condition that prevents queries from being buffered after vtgate startup #16655

Merged

5 tasks

mattlord assigned GuptaManan100 Aug 27, 2024

GuptaManan100 closed this as completed in #16655 Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report: Race condition can cause queries to fail when vtgate starts up #16656

Bug Report: Race condition can cause queries to fail when vtgate starts up #16656

GuptaManan100 commented Aug 27, 2024 •

edited

Loading

Bug Report: Race condition can cause queries to fail when vtgate starts up #16656

Bug Report: Race condition can cause queries to fail when vtgate starts up #16656

Comments

GuptaManan100 commented Aug 27, 2024 • edited Loading

Overview of the Issue

Reproduction Steps

Binary Version

Operating System and Environment details

Log Fragments

GuptaManan100 commented Aug 27, 2024 •

edited

Loading