Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add unsplittable ranges to split queue purgatory #25245

Merged

Conversation

nvanbenschoten
Copy link
Member

Fixes #25191.

This makes the fix for #24215 trivial.

In #14654 we added a mechanism to double the max range size whenever a
split attempt found a range that was unsplittable. This prevented a
tight loop of split attempts. However, it didn't actually do anything to
help us find a split point to reduce the size of the range in the
future. This size doubling worked in practice, but it was a blunt
instrument that had strange effects (see #24215).

This change rips out this range size doubling and replaces it with a
split queue purgatory. This purgatory is used to house replicas that
are unsplittable, preventing them from getting into a tight loop.

Release note: None

@nvanbenschoten nvanbenschoten requested review from bdarnell and a team May 2, 2018 17:23
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/splitQueuePurg branch 2 times, most recently from 6e84425 to ebd1a8e Compare May 2, 2018 22:47
@bdarnell
Copy link
Contributor

bdarnell commented May 3, 2018

:lgtm:


Reviewed 10 of 10 files at r1, 1 of 1 files at r2, 7 of 7 files at r3.
Review status: all files reviewed at latest revision, all discussions resolved.


pkg/storage/replica_command.go, line 740 at r3 (raw file):

		// On seeing a ConditionFailedError or an AmbiguousResultError, retry the
		// command with the updated descriptor.
		switch errors.Cause(lastErr).(type) {

I don't like mixing the error and roachpb.Error domains here. I think I'd rather make unsplittableRangeError a roachpb error so we can use the same introspection mode for it as for the other errors.


pkg/storage/split_queue.go, line 40 at r3 (raw file):

	// unsplittable because they do not contain a suitable split key. Purgatory
	// prevents them from repeatedly attempting to split at an unbounded rate.
	splitQueuePurgatoryCheckInterval = 10 * time.Minute

This seems like a long time. I'd make this 1 minute instead.


Comments from Reviewable

This makes the interface easier to use with any time-based channel
mechanism, like Tickers and Timers.

Release note: None
In cockroachdb#14654 we added a mechanism to double the max range size whenever a
split attempt found a range that was unsplittable. This prevented a
tight loop of split attempts. However, it didn't actually do anything to
help us find a split point to reduce the size of the range in the
future. This size doubling worked in practice, but it was a blunt
instrument that had strange effects (see cockroachdb#24215).

This change rips out this range size doubling and replaces it with a
split queue purgatory. This purgatory is used to house replicas that
are unsplittable, preventing them from getting into a tight loop.

Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/splitQueuePurg branch from ebd1a8e to e495467 Compare May 8, 2018 17:20
@nvanbenschoten
Copy link
Member Author

Review status: all files reviewed at latest revision, 2 unresolved discussions.


pkg/storage/replica_command.go, line 740 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I don't like mixing the error and roachpb.Error domains here. I think I'd rather make unsplittableRangeError a roachpb error so we can use the same introspection mode for it as for the other errors.

If we do that then we would need to have a roachpb error implement purgatoryError, which seems wrong to me. With the way this is structured now, splitQueue.process never enters the roachpb.Error domain, which seems like the right approach to me.


pkg/storage/split_queue.go, line 40 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This seems like a long time. I'd make this 1 minute instead.

Done.


Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented May 8, 2018

Reviewed 7 of 7 files at r6.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


pkg/storage/replica_command.go, line 740 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we do that then we would need to have a roachpb error implement purgatoryError, which seems wrong to me. With the way this is structured now, splitQueue.process never enters the roachpb.Error domain, which seems like the right approach to me.

Yeah, I guess that would mean exporting all of the purgatory errors to roachpb, which doesn't seem worth it. This is fine, then.


Comments from Reviewable

@nvanbenschoten
Copy link
Member Author

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request May 8, 2018
25245: storage: add unsplittable ranges to split queue purgatory r=nvanbenschoten a=nvanbenschoten

Fixes #25191.

This makes the fix for #24215 trivial.

In #14654 we added a mechanism to double the max range size whenever a
split attempt found a range that was unsplittable. This prevented a
tight loop of split attempts. However, it didn't actually do anything to
help us find a split point to reduce the size of the range in the
future. This size doubling worked in practice, but it was a blunt
instrument that had strange effects (see #24215).

This change rips out this range size doubling and replaces it with a
split queue purgatory. This purgatory is used to house replicas that
are unsplittable, preventing them from getting into a tight loop.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig
Copy link
Contributor

craig bot commented May 8, 2018

Build succeeded

@craig craig bot merged commit e495467 into cockroachdb:master May 8, 2018
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/splitQueuePurg branch May 21, 2018 18:25
craig bot pushed a commit that referenced this pull request Sep 2, 2018
29440: backport 2.0: storage: improve logging of queue errors r=andreimatei a=andreimatei

Backport of one commit from #25245, because seeing these errors in a 2.0 cluster would have really helped me.

Fixes #25191.

Release note: None


Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: queues swallow errors if requeueing
3 participants