Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] kvcoord: improve truncate() and next() #68564

Closed

Conversation

jordanlewis
Copy link
Member

Touches #68536.

Previously, the way the DistSender invoked truncate(), next(), and
prev() was quadratic: each invocation of these functions required
iterating over the entire list of requests in a batch request, to either
find a start key or to find all requests that intersect with a given
range.

However, we can do better: we can process the batches sequentially and
keep track of which parts of the requests' spans we've "chopped off" and
used already. By doing this, we can skip doing extra work for those
already-used span portions.

The loop in this case will remain quadratic, but it's kind of amortized,
since we reduce the set of things to check on every iteration.

There could be a better way of doing this still (Yahor suggested an
interval tree) but I'm not quite sure, and it's complicated.

Release note: None

Previously, the way the DistSender invoked truncate(), next(), and
prev() was quadratic: each invocation of these functions required
iterating over the entire list of requests in a batch request, to either
find a start key or to find all requests that intersect with a given
range.

However, we can do better: we can process the batches sequentially and
keep track of which parts of the requests' spans we've "chopped off" and
used already. By doing this, we can skip doing extra work for those
already-used span portions.

The loop in this case will remain quadratic, but it's kind of amortized,
since we reduce the set of things to check on every iteration.

There could be a better way of doing this still (Yahor suggested an
interval tree) but I'm not quite sure, and it's complicated.

Release note: None
@jordanlewis jordanlewis requested a review from a team as a code owner August 6, 2021 21:39
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis
Copy link
Member Author

Note: this is not working yet, it is definitely still buggy.

Also, it's just one idea to fix the issue and may not be optimal. I think an approach like this would solve much of the wasted CPU, yet it doesn't truly reduce the complexity of the algorithm in a way that would be ideal, since we're not sorting the requests, or using a tree, etc - we're just marking ones that have been consumed and skipping work that has already been known to be used up.

Finally, we had to remove some concurrency - it used to be the case that truncate itself could be run in the async task, saving some serial work. But doing this makes it very hard to coordinate the work so that ranges can skip iterating through things that have been done already. So, we've moved out the truncate into the serial component for now.

@tbg tbg removed the request for review from a team September 15, 2021 08:19
@jordanlewis jordanlewis marked this pull request as draft October 8, 2021 22:56
@yuzefovich
Copy link
Member

I have a solution that sorts the requests in #82865, so I think this PR should be closed now.

@irfansharif irfansharif removed their request for review June 20, 2022 20:54
@yuzefovich
Copy link
Member

Overtaken by #82865.

@yuzefovich yuzefovich closed this Jul 14, 2022
@jordanlewis jordanlewis deleted the fix-truncate-and-next branch July 17, 2022 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants