Consumer: reduce ticker allocations #1028

faceair · 2018-01-24T12:55:20Z

A ticker is created after each response, even when there is no messages in the response, so when the consumer is idle the CPU is still high. The cost of ticker creation and destruction is expensive, when the number of fetch requests goes up, both the CPU and memory are consumed by the tickers.

So I try to reuse the ticker, and did not change the existing behavior. Now the timeout is still low precision (between MaxProcessingTime and 2 * MaxProcessingTime).

Before:

After:

eapache · 2018-02-02T14:55:15Z

A ticker is created after each response, even when there is no messages in the response, so when the consumer is idle the CPU is still high.

When the consumer is idle you should only be processing maybe one or two responses per second, so there should be no noticeable cost here. What is your Consumer.MaxWaitTime config?

The change is still OK, but I'm curious if it might make things worse for high-volume consumers? My understanding is that time.Now() can also be expensive, and we're now calling that once per message (not once per response).

faceair · 2018-02-02T16:11:13Z

I use the sarama-cluster and not change the default config, the MaxWaitTime is 250 * time.Millisecond. Every process have 8 cluster.Consumer instance, i see about 500 fetch request per second per process from the cluster monitor when idle. I don't know why, it's very strange...
Or can i use other method to verify this ? Maybe i can add some event tracking in process. 🤔

The time.Now()'s expense is clear, but when the golang's timers heap goes up the ticker's expense is not clear. If just worry the time.Now()'s expense, we can create a millisecond precision timer (like coarseTime), then call time.Now()'s expense will be constant.

eapache · 2018-02-02T19:31:46Z

Every process have 8 cluster.Consumer instance

I don't know a lot about sarama-cluster, but I would think you'd only need one per process?

i see about 500 fetch request per second per process from the cluster monitor when idle

That is very high for idle. How many brokers are in your kafka cluster?

faceair · 2018-02-03T02:40:13Z

In my system, topic has priority, I use multiple instances and control the timing of consumer messages.
20 brokers in this cluster.

I'll verify the count of the fetch requests next Monday.

faceair · 2018-02-03T03:29:32Z

The MaxProcessingTime is the time to deal with each response ? Should I create the startSend for each response, not per message ?

eapache · 2018-02-03T20:16:09Z

No, MaxProcessingTime is the time to deal with each message; what you've got here is correct.

eapache · 2018-02-03T20:28:54Z

One approach that might be more efficient (since it wouldn't require tracking time.Now on every message) would be to keep the msgSent boolean but simply set it to true inside the loop before the messageSelect label. It would no longer represent a sent message exactly (maybe it could be renamed to firstAttempt or something) but it would still let you move the ticker outside the main loop.

faceair · 2018-02-04T05:13:16Z

yeah, it's a good solution. 😁

faceair · 2018-02-05T06:56:59Z

I add a counter in the responseFeeder loop, and show the partitionConsumer.feeder loop run rate is about 500 times per second per process, for every sarama instance is about 500/8=62 times per second.

And I found this consumer.go#L727, when broker fetch an response broker will dispatch the response to every partitionConsumer.feeder, so the partitionConsumer.feeder loop will run more times. Every topic has 50 partitions, and I have 3 process, so every process may have about 16 partitions. 62/16 = 4 times per second per broker.

So it looks like there's no problem here.

eapache

@daedric wondering if you have thoughts on this PR, since you recently submitted #1036 in a slightly different direction. There's no "leak" in this PR, but it would involve a ticker hanging around all the time for slow consumers.

eapache · 2018-02-06T23:05:02Z

consumer.go

@@ -441,20 +441,20 @@ func (child *partitionConsumer) HighWaterMarkOffset() int64 {

 func (child *partitionConsumer) responseFeeder() {
 	var msgs []*ConsumerMessage
-	msgSent := false
+	var firstAttempt bool


I believe this needs to be initialized to true now

daedric · 2018-02-07T15:14:53Z

Actually, while the leak have been fixed, I meant to ask you what was really the purpose of this ticker. Upon trigger, it does nothing apart from a continue that will eventually process the remaining messages.
If you were to ask me, without more input, I would simply remove this ticker :)

eapache · 2018-02-07T16:06:31Z

The first time it ticks it simply sets a flag and repeats the select statement. The second time it ticks it fully detaches the partition from the consumer in order to unblock the remaining partitions. This is important so that a slow message consumer doesn't drag the entire process down.

faceair · 2018-02-25T15:57:21Z

after this pr merged the line consumer.go#L464 should delete ?

eapache · 2018-02-26T14:32:03Z

Yes! I'm not sure how we missed that.

Now that we reuse a single ticker the whole time, this would probably lead to the consumer just kind of hanging and never resuming. Thanks to #1028 (comment) for catching this.

faceair force-pushed the master branch 2 times, most recently from d7428ad to 733cd80 Compare February 4, 2018 05:12

eapache reviewed Feb 6, 2018

View reviewed changes

Consumer: reduce ticker allocations

57f03c7

faceair force-pushed the master branch from 733cd80 to 57f03c7 Compare February 7, 2018 03:19

eapache merged commit 5910255 into IBM:master Feb 9, 2018

eapache mentioned this pull request May 9, 2018

regression in consumer throughput when upgrading 1.10 to 1.16 #1101

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consumer: reduce ticker allocations #1028

Consumer: reduce ticker allocations #1028

faceair commented Jan 24, 2018 •

edited

Loading

eapache commented Feb 2, 2018

faceair commented Feb 2, 2018 •

edited

Loading

eapache commented Feb 2, 2018

faceair commented Feb 3, 2018

faceair commented Feb 3, 2018

eapache commented Feb 3, 2018

eapache commented Feb 3, 2018

faceair commented Feb 4, 2018

faceair commented Feb 5, 2018 •

edited

Loading

eapache left a comment

eapache Feb 6, 2018

daedric commented Feb 7, 2018

eapache commented Feb 7, 2018

faceair commented Feb 25, 2018

eapache commented Feb 26, 2018

Consumer: reduce ticker allocations #1028

Consumer: reduce ticker allocations #1028

Conversation

faceair commented Jan 24, 2018 • edited Loading

eapache commented Feb 2, 2018

faceair commented Feb 2, 2018 • edited Loading

eapache commented Feb 2, 2018

faceair commented Feb 3, 2018

faceair commented Feb 3, 2018

eapache commented Feb 3, 2018

eapache commented Feb 3, 2018

faceair commented Feb 4, 2018

faceair commented Feb 5, 2018 • edited Loading

eapache left a comment

Choose a reason for hiding this comment

eapache Feb 6, 2018

Choose a reason for hiding this comment

daedric commented Feb 7, 2018

eapache commented Feb 7, 2018

faceair commented Feb 25, 2018

eapache commented Feb 26, 2018

faceair commented Jan 24, 2018 •

edited

Loading

faceair commented Feb 2, 2018 •

edited

Loading

faceair commented Feb 5, 2018 •

edited

Loading