stability: range is already split error when single MVCC key gets too large #10095

nvanbenschoten · 2016-10-19T23:36:12Z

I think this is a known problem, but I wasn't been able to find an issue for it, so I figured I'd file one. Related to #5252 and #9540.

I'm currently running a 3 node cluster under medium load that is continuously updating the same key (the original intention was to stress the cmdQ under heavy write contention). After a few hours I found that my logs were filled with:

E161019 23:20:05.879457 65 storage/queue.go:558  [n3,split] on [n3,s3,r23/3:/{Table/52/1/1…-Max}]: storage/replica_command.go:2339: range is already split at key /Table/52/1/12345
I161019 23:20:05.879614 65 storage/split_queue.go:120  [n3,split,s3,r23/3:/{Table/52/1/1…-Max}] splitting size=645485748 max=67108864
E161019 23:20:06.852154 65 storage/queue.go:558  [n3,split] on [n3,s3,r23/3:/{Table/52/1/1…-Max}]: storage/replica_command.go:2339: range is already split at key /Table/52/1/12345
I161019 23:20:06.852650 65 storage/split_queue.go:120  [n3,split,s3,r23/3:/{Table/52/1/1…-Max}] splitting size=645484633 max=67108864

Note that 12345 is the primary key of the row I'm updating, so I suspect that the keys in the updated row each have MVCC histories large enough that we're trying to split the single row up into separate ranges, which isn't allowed. In cases like this where a single key gets too large, we need to be more proactive about GCing old MVCC versions.

@tschottdorf @bdarnell

The text was updated successfully, but these errors were encountered:

tbg · 2016-10-20T09:45:40Z

Yeah, that error needs to be better. As for more proactively truncating, that may not be possible depending on the desired semantics of the TTL. Currently it's an "at least that long" which makes sense - you set it to 24hrs, and you get the guarantee that you can read data at least 24hrs in the past. What's the worth of setting it to 24hrs if we (have to) GC after 12 minutes and consequently disallow you from doing your historical reads? That might be worse if all you're doing are historical reads.

We could of course try to get fancy with it and add an optional "gc threshold" key to the metadata and use that for keys like that (to avoid the problem spreading on the whole range, it would only affect this key), but that seems too fringe and doesn't really solve the problem.

My instinct is that this situation would be something that should show up in alerts and should be solved by either increasing the range max size or by lowering the TTL.

Btw, could you post the load generator? For proposer-evaluated KV and the resulting command queue works a hotspotty load generator is going to be useful for high-level measurements.

petermattis · 2016-10-20T12:31:40Z

It would be easy enough to detect that we can't split a range because it contains only a single (very large) row. But a bigger problem here is that we'll continue trying to split the range over and over and not succeeding. I'm not sure if I see an easy way to avoid this. In this particular instance, we might be able to look at MVCCStats.key_count, but in general keys != rows.

tbg · 2016-10-20T12:41:30Z

That's a good point, and even having 100 keys doesn't mean you can win anything by splitting (seems better not to split than to split 99 small keys off to their their own tiny range and keep the 100th in its own range of about the original size).

petermattis · 2016-10-20T12:44:54Z

Actually, can't we look at the start and the end key of a range and determine that it is unsplittable? If keys.EnsureSafeSplitKey(startKey) == keys.EnsureSafeSplitKey(endKey) we shouldn't attempt to split it. We shouldn't even add it to the split queue.

tbg · 2016-10-20T13:14:20Z

See my example above - you might have 100 keys and only one makes up the mass of the range.

tbg · 2016-10-20T13:15:10Z

(However, if we decide that splitting off very small ranges is ok in this scenario, then we would end up in a situation where we only have that large key, and then that would work)

petermattis · 2016-10-20T13:55:08Z

Yeah, I was thinking we'd split off the tiny range and then be left with a large unsplittable range. That seems acceptable.

nvanbenschoten · 2016-10-23T18:56:30Z

@tschottdorf The load generator is hacked together in this commit. The generator is using an UPDATE statement on the same row continuously, so it's both reading and writing, instead of strictly doing KV writes on the same key. Still, I think its an interesting workload to stress cases like this.

I was thinking about cleaning it up and adding it as a "single_block" mode to block_writer. What do you think?

tbg · 2016-10-24T21:00:25Z

Yes on cleaning it up, not sure on where to put it (or whether to put that exact one somewhere). The block writer shouldn't turn into our dumping ground or it'll be more and more annoying to maintain and use. Probably worth thinking about how we're going to maintain our multitude of load generators. Perhaps easier to have more focused binaries and shell out the bulk of block_writer into a small framework.

petermattis · 2016-10-24T21:22:25Z

It's probably worthwhile to extract some commonalities from the various load generators into a library. I'm okay with adding a new mode to block_writer in the short term. Sometimes debt is okay.

spencerkimball · 2017-04-04T00:11:21Z

Duplicate of #9555

nvanbenschoten added the S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting label Oct 19, 2016

petermattis added this to the 1.0 milestone Feb 23, 2017

spencerkimball closed this as completed Apr 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stability: range is already split error when single MVCC key gets too large #10095

stability: range is already split error when single MVCC key gets too large #10095

nvanbenschoten commented Oct 19, 2016

tbg commented Oct 20, 2016

petermattis commented Oct 20, 2016

tbg commented Oct 20, 2016

petermattis commented Oct 20, 2016 •

edited

Loading

tbg commented Oct 20, 2016

tbg commented Oct 20, 2016

petermattis commented Oct 20, 2016

nvanbenschoten commented Oct 23, 2016

tbg commented Oct 24, 2016

petermattis commented Oct 24, 2016

spencerkimball commented Apr 4, 2017

stability: range is already split error when single MVCC key gets too large #10095

stability: range is already split error when single MVCC key gets too large #10095

Comments

nvanbenschoten commented Oct 19, 2016

tbg commented Oct 20, 2016

petermattis commented Oct 20, 2016

tbg commented Oct 20, 2016

petermattis commented Oct 20, 2016 • edited Loading

tbg commented Oct 20, 2016

tbg commented Oct 20, 2016

petermattis commented Oct 20, 2016

nvanbenschoten commented Oct 23, 2016

tbg commented Oct 24, 2016

petermattis commented Oct 24, 2016

spencerkimball commented Apr 4, 2017

petermattis commented Oct 20, 2016 •

edited

Loading