-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: avoid splitting ranges that cannot be split #9555
Comments
Remark by @dt: we can't work around this by simply setting a maximum value size and starting enforcing this max size, because for all we know there are already clusters with column families that would exceed whatever limit we want to enforce, and we couldn't migrate away from that situation any more. |
@BramGruneir we thought that you've been looking at split recently, is this perhaps something for you? Otherwise feel free to reassign. |
This seems like a fairly important thing to have, just don't have the time to work on it right now (and clearly didn't back in sept). Un-assigning for now. |
At the very least we should only retry the split after a backoff period for a range that has this problem instead of busily retrying. This is a real problem we should probably fix for 1.0. I'm assigning to @petermattis. Re-assign or remove 1.0 milestone. |
Reproduction instructions:
The split queue then starts looping trying to split the range:
|
The specific cause of the loop is that |
We could adjust |
How about adding a |
Hmm, that might work, though I'm not sure we need another field. If we can't find a split key for a size-based split, we can call |
Sounds like we'd still get plenty of complaints. I'd suggest multiplying current size by two. But the reason I suggested another variable is that we call |
I'd rather error on the side of resetting, though. What additional complaints are you worried about? I have my suggestion implemented and it works for my test case. If you add another row to the range the range gets split.
True, but attempting another split and failing isn't terrible. It is the repeated retries we're trying to avoid and I'd rather retry once than have a bug where the target range size gets huge and we never try to split the range again. |
If a range cannot be split because a valid split key cannot be found, set the max size for the range to the range's current size. When a range is successfully split, reset the max range size to the zone's max range size. This avoids situations where a range with a single large row cannot be split. Fixes cockroachdb#9555
If a range cannot be split because a valid split key cannot be found, set the max size for the range to the range's current size. When a range is successfully split, reset the max range size to the zone's max range size. This avoids situations where a range with a single large row cannot be split. Fixes cockroachdb#9555
If a range cannot be split because a valid split key cannot be found, set the max size for the range to the range's current size. When a range is successfully split, reset the max range size to the zone's max range size. This avoids situations where a range with a single large row cannot be split. Fixes cockroachdb#9555
I'm trying to understand the decision we made here, especially in the context of #24215. Since we're doubling the size when a split point can't be found, we're not actually doing anything to help us find a split point or reduce the range size in the future. Instead, all we're accomplishing is avoiding the tight loop of retrying split attempts of an unsplittable range over and over again. That's fine and this change was a step in the right direction, but if that's the only thing we were trying to accomplish then why didn't we just stick the replica in the splitQueue's purgatory queue to reduce the frequency of split attempts to some more manageable rate? |
The purgatory queues are only checked when there are cluster changes, while a range with a single large value should be checked for splitting whenever additional keys are written to it. Am I misunderstanding how you'd want to use the purgatory queue? |
Are you sure? That looks to me like a detail of the |
No, I'm not sure. I haven't looked at the purgatory queue in a while. My statement was made from memory and doesn't necessarily reflect reality. |
Yeah, purgatory makes sense for this. The only reason that it wasn't used is that it's not widely understood, as seen here. |
When a single KV value becomes larger than the nominal range size, the range grows with it and a split is initiated but the split cannot succeed because the entire size is taken up by a single value. This causes spurious errors and adds pressure on the split queue unnecessarily.
Such "large" ranges should be marked unsplittable and never enter the split queue after they've been marked.
(Suggested by @paperstreet)
The text was updated successfully, but these errors were encountered: