Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stability: range is already split error when single MVCC key gets too large #10095

Closed
nvanbenschoten opened this issue Oct 19, 2016 · 11 comments
Closed
Labels
S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Milestone

Comments

@nvanbenschoten
Copy link
Member

I think this is a known problem, but I wasn't been able to find an issue for it, so I figured I'd file one. Related to #5252 and #9540.

I'm currently running a 3 node cluster under medium load that is continuously updating the same key (the original intention was to stress the cmdQ under heavy write contention). After a few hours I found that my logs were filled with:

E161019 23:20:05.879457 65 storage/queue.go:558  [n3,split] on [n3,s3,r23/3:/{Table/52/1/1…-Max}]: storage/replica_command.go:2339: range is already split at key /Table/52/1/12345
I161019 23:20:05.879614 65 storage/split_queue.go:120  [n3,split,s3,r23/3:/{Table/52/1/1…-Max}] splitting size=645485748 max=67108864
E161019 23:20:06.852154 65 storage/queue.go:558  [n3,split] on [n3,s3,r23/3:/{Table/52/1/1…-Max}]: storage/replica_command.go:2339: range is already split at key /Table/52/1/12345
I161019 23:20:06.852650 65 storage/split_queue.go:120  [n3,split,s3,r23/3:/{Table/52/1/1…-Max}] splitting size=645484633 max=67108864

Note that 12345 is the primary key of the row I'm updating, so I suspect that the keys in the updated row each have MVCC histories large enough that we're trying to split the single row up into separate ranges, which isn't allowed. In cases like this where a single key gets too large, we need to be more proactive about GCing old MVCC versions.

@tschottdorf @bdarnell

@nvanbenschoten nvanbenschoten added the S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting label Oct 19, 2016
@tbg
Copy link
Member

tbg commented Oct 20, 2016

Yeah, that error needs to be better. As for more proactively truncating, that may not be possible depending on the desired semantics of the TTL. Currently it's an "at least that long" which makes sense - you set it to 24hrs, and you get the guarantee that you can read data at least 24hrs in the past. What's the worth of setting it to 24hrs if we (have to) GC after 12 minutes and consequently disallow you from doing your historical reads? That might be worse if all you're doing are historical reads.

We could of course try to get fancy with it and add an optional "gc threshold" key to the metadata and use that for keys like that (to avoid the problem spreading on the whole range, it would only affect this key), but that seems too fringe and doesn't really solve the problem.

My instinct is that this situation would be something that should show up in alerts and should be solved by either increasing the range max size or by lowering the TTL.

Btw, could you post the load generator? For proposer-evaluated KV and the resulting command queue works a hotspotty load generator is going to be useful for high-level measurements.

@petermattis
Copy link
Collaborator

It would be easy enough to detect that we can't split a range because it contains only a single (very large) row. But a bigger problem here is that we'll continue trying to split the range over and over and not succeeding. I'm not sure if I see an easy way to avoid this. In this particular instance, we might be able to look at MVCCStats.key_count, but in general keys != rows.

@tbg
Copy link
Member

tbg commented Oct 20, 2016

That's a good point, and even having 100 keys doesn't mean you can win anything by splitting (seems better not to split than to split 99 small keys off to their their own tiny range and keep the 100th in its own range of about the original size).

@petermattis
Copy link
Collaborator

petermattis commented Oct 20, 2016

Actually, can't we look at the start and the end key of a range and determine that it is unsplittable? If keys.EnsureSafeSplitKey(startKey) == keys.EnsureSafeSplitKey(endKey) we shouldn't attempt to split it. We shouldn't even add it to the split queue.

@tbg
Copy link
Member

tbg commented Oct 20, 2016

See my example above - you might have 100 keys and only one makes up the mass of the range.

@tbg
Copy link
Member

tbg commented Oct 20, 2016

(However, if we decide that splitting off very small ranges is ok in this scenario, then we would end up in a situation where we only have that large key, and then that would work)

@petermattis
Copy link
Collaborator

Yeah, I was thinking we'd split off the tiny range and then be left with a large unsplittable range. That seems acceptable.

@nvanbenschoten
Copy link
Member Author

@tschottdorf The load generator is hacked together in this commit. The generator is using an UPDATE statement on the same row continuously, so it's both reading and writing, instead of strictly doing KV writes on the same key. Still, I think its an interesting workload to stress cases like this.

I was thinking about cleaning it up and adding it as a "single_block" mode to block_writer. What do you think?

@tbg
Copy link
Member

tbg commented Oct 24, 2016

Yes on cleaning it up, not sure on where to put it (or whether to put that exact one somewhere). The block writer shouldn't turn into our dumping ground or it'll be more and more annoying to maintain and use. Probably worth thinking about how we're going to maintain our multitude of load generators. Perhaps easier to have more focused binaries and shell out the bulk of block_writer into a small framework.

@petermattis
Copy link
Collaborator

It's probably worthwhile to extract some commonalities from the various load generators into a library. I'm okay with adding a new mode to block_writer in the short term. Sometimes debt is okay.

@petermattis petermattis added this to the 1.0 milestone Feb 23, 2017
@spencerkimball
Copy link
Member

Duplicate of #9555

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Projects
None yet
Development

No branches or pull requests

4 participants