-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stability: range is already split error when single MVCC key gets too large #10095
Comments
Yeah, that error needs to be better. As for more proactively truncating, that may not be possible depending on the desired semantics of the TTL. Currently it's an "at least that long" which makes sense - you set it to 24hrs, and you get the guarantee that you can read data at least 24hrs in the past. What's the worth of setting it to 24hrs if we (have to) GC after 12 minutes and consequently disallow you from doing your historical reads? That might be worse if all you're doing are historical reads. We could of course try to get fancy with it and add an optional "gc threshold" key to the metadata and use that for keys like that (to avoid the problem spreading on the whole range, it would only affect this key), but that seems too fringe and doesn't really solve the problem. My instinct is that this situation would be something that should show up in alerts and should be solved by either increasing the range max size or by lowering the TTL. Btw, could you post the load generator? For proposer-evaluated KV and the resulting command queue works a hotspotty load generator is going to be useful for high-level measurements. |
It would be easy enough to detect that we can't split a range because it contains only a single (very large) row. But a bigger problem here is that we'll continue trying to split the range over and over and not succeeding. I'm not sure if I see an easy way to avoid this. In this particular instance, we might be able to look at |
That's a good point, and even having 100 keys doesn't mean you can win anything by splitting (seems better not to split than to split 99 small keys off to their their own tiny range and keep the 100th in its own range of about the original size). |
Actually, can't we look at the start and the end key of a range and determine that it is unsplittable? If |
See my example above - you might have 100 keys and only one makes up the mass of the range. |
(However, if we decide that splitting off very small ranges is ok in this scenario, then we would end up in a situation where we only have that large key, and then that would work) |
Yeah, I was thinking we'd split off the tiny range and then be left with a large unsplittable range. That seems acceptable. |
@tschottdorf The load generator is hacked together in this commit. The generator is using an UPDATE statement on the same row continuously, so it's both reading and writing, instead of strictly doing KV writes on the same key. Still, I think its an interesting workload to stress cases like this. I was thinking about cleaning it up and adding it as a "single_block" mode to |
Yes on cleaning it up, not sure on where to put it (or whether to put that exact one somewhere). The block writer shouldn't turn into our dumping ground or it'll be more and more annoying to maintain and use. Probably worth thinking about how we're going to maintain our multitude of load generators. Perhaps easier to have more focused binaries and shell out the bulk of block_writer into a small framework. |
It's probably worthwhile to extract some commonalities from the various load generators into a library. I'm okay with adding a new mode to |
Duplicate of #9555 |
I think this is a known problem, but I wasn't been able to find an issue for it, so I figured I'd file one. Related to #5252 and #9540.
I'm currently running a 3 node cluster under medium load that is continuously updating the same key (the original intention was to stress the cmdQ under heavy write contention). After a few hours I found that my logs were filled with:
Note that
12345
is the primary key of the row I'm updating, so I suspect that the keys in the updated row each have MVCC histories large enough that we're trying to split the single row up into separate ranges, which isn't allowed. In cases like this where a single key gets too large, we need to be more proactive about GCing old MVCC versions.@tschottdorf @bdarnell
The text was updated successfully, but these errors were encountered: