Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Chunk Size to RR Balancer (Increased Batching Ability) #1232

Merged
merged 8 commits into from
Nov 27, 2023

Conversation

erushing
Copy link
Contributor

@erushing erushing commented Nov 15, 2023

The motivation here is to try to get Kafka-Go's RR Balancer to batch more aggressively. The RR Balancer naively iterates through the available partitions, putting a single message on each of them until the batch timeout. If it could put x messages on each partition before moving onto the next one, it would still be evenly distributed, but batch better.

The coding approach was to bring in the same chunking mechanism our internal Bulrush library uses, but simplify it and adapt it for the code-style of Kafka-Go.

balancer.go Outdated Show resolved Hide resolved
balancer.go Outdated
// across all available partitions, but puts greater emphasis on batching by a chunk size
// within a shorter time period than is possible via the regular RoundRobin Balancer.
type ChunkedRoundRobin struct {
chunkSize int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want chunkSize to be public so that users can configure it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is where I thought the test showed you could pass in a value to the struct, but I would find this out for real when I tried to actually test this with a kafka-go based service. I see that Bulrush uses a setter for this, so I'm sure I'm off base with the way I did this.

I had this worker in mind where I want to be able to pass in a chunk size as they pass in RR as a balancer.
https://github.com/segmentio/identity/blob/2d04b8f5a16d235453c922e1e9d1ff7ca1b2a92b/identity-resolver/worker.go#L290C21-L290C21

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is in the same package as the balancer so it is able to reference private fields. That won't be the case for users of the balancer. We don't use setters elsewhere in this kafka-go package and just use public fields for fields that we want to expose to users so I'd be inclined to stick with that convention

@erushing
Copy link
Contributor Author

erushing commented Nov 27, 2023

This was tested on a live application and showed increased batching, despite fairly low throughput through each producer and a low batch timeout value (10ms). Chunk Size of 10
Screen Shot 2023-11-27 at 1 28 47 PM
Screen Shot 2023-11-27 at 1 28 40 PM

@erushing erushing changed the title dp-1862 - Initial Spike on Bulrush-Style chunked RR Balancer/Partitioner Chunked RR Balancer/Partitioner (Increased Batching) Nov 27, 2023
@erushing erushing changed the title Chunked RR Balancer/Partitioner (Increased Batching) Add Chunk Size to RR Balancer (Increased Batching Ability) Nov 27, 2023
balancer.go Outdated Show resolved Hide resolved
@erushing erushing merged commit f568774 into main Nov 27, 2023
@erushing erushing deleted the er/test-rr-balancer branch November 27, 2023 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants