feature: consumer option to balance partition lags #222

erenboz · 2022-10-14T08:29:54Z

In one of our use-cases we had significantly different production/consumption rate on different partitions and ended up writing a logic of pausing unpausing fetched partitions to balance lag. It would be quite handy to add a consumer option to influence how many records pulled from each partitions or directly option to pull proportionately to the lag.

twmb · 2022-10-19T00:39:28Z

@erenboz that really isn't a thing that belongs in a client. A single client doesn't know anything about lag -- lag is a calculation that is determined from a few individual requests. It isn't up to a client itself to determine its own lag. Manual pausing & unpausing is the best path forward for your problem.

For the "how many records pulled from each partition" problem, I tried designing something like this in the past in #55 (comment). In short, though, it doesn't really make sense and would be pretty difficult to implement. Fetches behind the scenes are per broker. Even if you only drain partition A from the fetch, other partitions will remain in the fetch buffered within the client. The client cannot request another fetch for a broker until the entirety of its current buffered fetch is drained.

There could be some weird hacky option that allows you to control which partitions actually get issued in a request: the client could build candidate partitions, then your function could return partitions to actually issue in the request. I'm not sure this is a great idea ... but it could be feasible if a good API exists. What do you think?

yuzhichang · 2022-10-19T01:41:11Z

I have opposite need with @erenboz. There are ~1000 Kafka topics, each has several partitions. A few topics' producing throughput are high(~100K records/second), others are not such high.
If I create a kgo client for each topic, these clients compete fetching records, each gets a small batch of records and write to downstream OLAP database(the database prefers infrequent large batch insertion). The total consuming throughput is low(~700K records/second).
If I create only a kgo clients for only five topics, the total consuming throughput is better(~1000K records/second).

If a kgo client subscribe multiple topics, is there a way to prefer a large batch from one topic instead of proportionately small batches from multiple topics?

yuzhichang · 2022-10-21T09:39:58Z

@twmb From Kafka code here, the client know the lag of every topic and partition, and has the chance to adjust preference(default, proportionately, or large lag) of later FetchRequest?

twmb · 2022-10-23T17:40:37Z

My mistake, you're right, the client is given that information. I never historically used that field because the client hasn't needed to know lag. I'll try to think of some method of calculating how to prefer lag while also avoiding partition starvation.

This adds three new APIs: func ConsumePreferringLagFn(fn PreferLagFn) ConsumerOpt type PreferLagFn func(lag map[string]map[int32]int64, torderPrior []string, porderPrior map[string][]int32) ([]string, map[string][]int32) func PreferLagAt(preferLagAt int64) PreferLagFn These functions allow an end user to adjust the order of partitions that are being fetched. Ideally, an end user will only need: kgo.ConsumePreferringLagFn(kgo.PreferLagAt(50)) But, PreferLagFn exists to allow for more advanced use cases. Closes #222

twmb · 2022-11-14T21:01:55Z

PR #251 should close this, feel free to take a look / check it out if you want before it is merged for release 1.10

erenboz · 2022-12-08T12:40:20Z

@twmb @yuzhichang Oh I'm very late to the party cause I completely missed the mention notification. The solution looks great solving many similar issues. Thanks a lot.

erenboz changed the title ~~Consumer option to balance partition lags~~ feature: Consumer option to balance partition lags Oct 14, 2022

erenboz changed the title ~~feature: Consumer option to balance partition lags~~ feature: consumer option to balance partition lags Oct 14, 2022

twmb added help wanted enhancement New feature or request help wanted labels Oct 19, 2022

twmb removed the help wanted label Oct 19, 2022

twmb mentioned this issue Oct 23, 2022

Kafka read error with SASL auth: TOPIC_AUTHORIZATION_FAILED: Not authorized to access topics: [Topic authorization failed.] + GROUP_AUTHORIZATION_FAILED: Not authorized to access group: Group authorization failed #205

Closed

twmb mentioned this issue Nov 14, 2022

kgo: add option to consume preferring laggy partitions #251

Merged

twmb closed this as completed in #251 Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: consumer option to balance partition lags #222

feature: consumer option to balance partition lags #222

erenboz commented Oct 14, 2022

twmb commented Oct 19, 2022

yuzhichang commented Oct 19, 2022 •

edited

Loading

yuzhichang commented Oct 21, 2022 •

edited

Loading

twmb commented Oct 23, 2022

twmb commented Nov 14, 2022

erenboz commented Dec 8, 2022

feature: consumer option to balance partition lags #222

feature: consumer option to balance partition lags #222

Comments

erenboz commented Oct 14, 2022

twmb commented Oct 19, 2022

yuzhichang commented Oct 19, 2022 • edited Loading

yuzhichang commented Oct 21, 2022 • edited Loading

twmb commented Oct 23, 2022

twmb commented Nov 14, 2022

erenboz commented Dec 8, 2022

yuzhichang commented Oct 19, 2022 •

edited

Loading

yuzhichang commented Oct 21, 2022 •

edited

Loading