SHOW VITESS_SHARDS intermittently returns an empty list of shards #5038

tirsen · 2019-07-30T10:20:15Z

SHOW VITESS_SHARDS intermittently returns an empty list of shards

I think the issue can be traced down to these lines of code at go/vt/vtgate/executor.go:800:

			_, _, shards, err := e.resolver.resolver.GetKeyspaceShards(ctx, keyspace, destTabletType)
			if err != nil {
				// There might be a misconfigured keyspace or no shards in the keyspace.
				// Skip any errors and move on.
				continue
			}

As you can see errors are ignored. So if there is for example an intermittent connectivity problem with the topology server then we would return an empty list rather than signaling an error.

I think the correct solution is to catch and ignore specific errors but in general pass the errors on.

The text was updated successfully, but these errors were encountered:

tirsen · 2019-07-31T23:38:45Z

I think the main issue is that this hits Zookeeper and we do run SHOW VITESS_SHARDS as part of every single request for one of our apps. This means we send some pretty heavy qps to Zookeeper and sometimes it just can't keep up.

We should ideally be caching this instead.

tirsen · 2019-08-01T04:02:44Z

Actually following the codepath even further I can see now that this does not hit Zookeeper directly for every request. GetKeyspaceShards uses ResilientServer.GetSrvKeyspace which uses a local cache and a Zookeeper watcher to keep it up to date.

I think that's great but it doesn't explain why sometimes SHOW VITESS_SHARDS return 0. Will keep digging.

aquarapid · 2019-09-10T23:03:49Z

I can reproduce this locally easily against a standalone vtgate and the example DB:

$ cat test_issue_5038.py 
import MySQLdb
db = MySQLdb.connect("127.0.0.1","root",db="commerce",port=15306)
cursor = db.cursor()

for i in xrange(0,10000):
  cursor.execute("SHOW VITESS_SHARDS;")
  data = cursor.fetchone()
  try:
    if data[0] != "commerce/0":
      print data
  except:
    print data, i

db.close()

Running this gives, random failures, typically around 2 per 10000 selects on my machine, but it seems somewhat random, e.g.:

$ python test_issue_5038.py 
None 5671
None 8554

I will dig further.

aquarapid · 2019-09-10T23:24:42Z

This is definitely related to the ResilientServer cache refresh. If I bump the vtgate srv_topo_cache_ttl to something high (say 60s); I can run the testcase without any failures.

aquarapid · 2019-09-11T00:32:34Z

So, there's a race in the default configuration, even when ZK isn't being slow, because we have the TTL and refresh period set to the same value (1 second). One solution is to make sure the refresh period is lower than the TTL (e.g. TTL 2s, leave refresh as default of 1s). We could consider changing the defaults to this. Another fix would be to eliminate the race by resetting the entry.lastQueryTime in resilient_server.go to the entry.insertionTime when the topology has been refreshed.

tirsen · 2019-09-11T00:55:06Z

Having a fix to this would be really nice. We're using a workaround for this but it's quite unsatisfactory and will likely cause issues around shard splits.

…uested, instead of zeroing it out. Signed-off-by: Jacques Grove <aquarapid@gmail.com>

morgo added the Type: Bug label Jul 31, 2019

aquarapid added a commit to planetscale/vitess that referenced this issue Feb 26, 2020

Issue vitessio#5038 : Allow sending back stale shard name data if req…

d61f87f

…uested, instead of zeroing it out. Signed-off-by: Jacques Grove <aquarapid@gmail.com>

aquarapid mentioned this issue Feb 26, 2020

Allow sending back stale shard name data if requested,… #5189

Merged

deepthi closed this as completed in #5189 May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SHOW VITESS_SHARDS intermittently returns an empty list of shards #5038

SHOW VITESS_SHARDS intermittently returns an empty list of shards #5038

tirsen commented Jul 30, 2019 •

edited

Loading

tirsen commented Jul 31, 2019

tirsen commented Aug 1, 2019

aquarapid commented Sep 10, 2019

aquarapid commented Sep 10, 2019

aquarapid commented Sep 11, 2019

tirsen commented Sep 11, 2019

SHOW VITESS_SHARDS intermittently returns an empty list of shards #5038

SHOW VITESS_SHARDS intermittently returns an empty list of shards #5038

Comments

tirsen commented Jul 30, 2019 • edited Loading

tirsen commented Jul 31, 2019

tirsen commented Aug 1, 2019

aquarapid commented Sep 10, 2019

aquarapid commented Sep 10, 2019

aquarapid commented Sep 11, 2019

tirsen commented Sep 11, 2019

tirsen commented Jul 30, 2019 •

edited

Loading