-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Chunk Size to RR Balancer (Increased Batching Ability) #1232
Conversation
balancer.go
Outdated
// across all available partitions, but puts greater emphasis on batching by a chunk size | ||
// within a shorter time period than is possible via the regular RoundRobin Balancer. | ||
type ChunkedRoundRobin struct { | ||
chunkSize int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want chunkSize
to be public so that users can configure it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is where I thought the test showed you could pass in a value to the struct, but I would find this out for real when I tried to actually test this with a kafka-go based service. I see that Bulrush uses a setter for this, so I'm sure I'm off base with the way I did this.
I had this worker in mind where I want to be able to pass in a chunk size as they pass in RR as a balancer.
https://github.com/segmentio/identity/blob/2d04b8f5a16d235453c922e1e9d1ff7ca1b2a92b/identity-resolver/worker.go#L290C21-L290C21
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is in the same package as the balancer so it is able to reference private fields. That won't be the case for users of the balancer. We don't use setters elsewhere in this kafka-go package and just use public fields for fields that we want to expose to users so I'd be inclined to stick with that convention
The motivation here is to try to get Kafka-Go's RR Balancer to batch more aggressively. The RR Balancer naively iterates through the available partitions, putting a single message on each of them until the batch timeout. If it could put x messages on each partition before moving onto the next one, it would still be evenly distributed, but batch better.
The coding approach was to bring in the same chunking mechanism our internal Bulrush library uses, but simplify it and adapt it for the code-style of Kafka-Go.