Retry messages on shutdown #420

eapache · 2015-04-14T19:35:40Z

@Shopify/kafka fixes #419.

Rather than use the old complicated system of reference-counting flags to shutdown cleanly, do the much simpler thing: keep a sync.WaitGroup counting the number of messages "in flight" (aka owned by the producer). When shutdown is requested, spawn a goroutine that waits for this counter to hit 0, then closes everything in one go.

We add messages to the in-flight set in the topicDispatcher (only new messages with retries==0 though). We remove messages from the in-flight set in returnError and returnSuccesses; even if the Producer.Return.* values are false they are still guaranteed to see every message.

We also add/remove chaser messages in leaderDispatcher.

~~This still needs tests.~~

I'm not sure what performance impact the waitgroup will have. An alternative might be an atomic counter, and have the shutdown goroutine just poll it every 10ms or something.

eapache · 2015-04-16T20:12:08Z

I'm not sure what performance impact the waitgroup will have. An alternative might be an atomic counter, and have the shutdown goroutine just poll it every 10ms or something.

A recent benchmarking and profiling push says: maybe this has a tiny impact on performance, but it's still swamped out by stupid stuff like CRC calculations, so not a concern.

eapache · 2015-04-17T19:26:37Z

CI failing because of kisielk/errcheck#70

eapache · 2015-04-17T19:37:25Z

CI fixed.

@Shopify/kafka this is ready for review.

wvanbergen · 2015-04-25T11:45:18Z

async_producer.go

@@ -355,6 +347,7 @@ func (p *asyncProducer) leaderDispatcher(topic string, partition int32, input ch
 				// in fact this message is not even the current retry level, so buffer it for now (unless it's a just a chaser)
 				if msg.flags&chaser == chaser {
 					retryState[msg.retries].expectChaser = false
+					p.inFlight.Done() // this chaser is now useless and will be garbage collected


s/is now useless/is now handled/ ?

wvanbergen · 2015-04-25T11:52:03Z

I think the accounting is correct.

New (0 retries) messages increment, errors and successes decrement.
New chaser messages increment, and are decremented when they are handled.
Only shutdown messages are not accounted for, but there's only one and it doesn't propagate through the goroutines (see my comment above)

👍, this is a nice simplification and it's much easier to understand now.

eapache · 2015-04-27T14:13:48Z

Your description of the accounting matches my understanding exactly. Once CI is 🍏 and I've had a quick pair of 👀 on the third commit, I think this is good to go.

wvanbergen · 2015-04-27T14:17:31Z

async_producer.go

+			continue
+		} else if msg.retries == 0 {
+			if shuttingDown {
+				p.returnError(msg, ErrShuttingDown)


This reduces the inflight counter, but it was never incremented for this message. We should probably move the p.inFlight.Add(1) up so it always gets executed.

Oooh, really nice catch. I will fix and see if I can add or adjust a test for this case too.

OK, this path is fixed and tested now.

wvanbergen · 2015-04-27T14:36:00Z

I think this looks good. We should do some stress testing of this though.

eapache · 2015-04-27T14:45:43Z

I am comfortable enough to push this to master now. I will do some stressing before releasing the next stable version.

Retry messages on shutdown

eapache force-pushed the retry-on-shutdown branch 2 times, most recently from 0d90bd3 to 1ae385a Compare April 16, 2015 19:22

eapache force-pushed the retry-on-shutdown branch 4 times, most recently from 18c94ac to 91534c6 Compare April 17, 2015 19:18

eapache force-pushed the retry-on-shutdown branch 2 times, most recently from bd75081 to 644af59 Compare April 24, 2015 15:38

wvanbergen reviewed Apr 25, 2015
View reviewed changes

eapache force-pushed the retry-on-shutdown branch from 644af59 to 54eb5af Compare April 27, 2015 13:55

wvanbergen reviewed Apr 27, 2015
View reviewed changes

eapache added 2 commits April 27, 2015 14:19

Retry messages on shutdown

42123dd

Add a test for retries during shutdow

08ccf5e

eapache force-pushed the retry-on-shutdown branch from 2bb62c1 to 08ccf5e Compare April 27, 2015 14:27

eapache added a commit that referenced this pull request Apr 27, 2015

Merge pull request #420 from Shopify/retry-on-shutdown

f948bc2

Retry messages on shutdown

eapache merged commit f948bc2 into master Apr 27, 2015

eapache deleted the retry-on-shutdown branch April 27, 2015 14:45

eapache mentioned this pull request May 18, 2015

panic: sync: negative WaitGroup counter #449

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry messages on shutdown #420

Retry messages on shutdown #420

eapache commented Apr 14, 2015

eapache commented Apr 16, 2015

eapache commented Apr 17, 2015

eapache commented Apr 17, 2015

wvanbergen Apr 25, 2015

wvanbergen commented Apr 25, 2015

eapache commented Apr 27, 2015

wvanbergen Apr 27, 2015

eapache Apr 27, 2015

eapache Apr 27, 2015

wvanbergen commented Apr 27, 2015

eapache commented Apr 27, 2015

Retry messages on shutdown #420

Retry messages on shutdown #420

Conversation

eapache commented Apr 14, 2015

eapache commented Apr 16, 2015

eapache commented Apr 17, 2015

eapache commented Apr 17, 2015

wvanbergen Apr 25, 2015

Choose a reason for hiding this comment

wvanbergen commented Apr 25, 2015

eapache commented Apr 27, 2015

wvanbergen Apr 27, 2015

Choose a reason for hiding this comment

eapache Apr 27, 2015

Choose a reason for hiding this comment

eapache Apr 27, 2015

Choose a reason for hiding this comment

wvanbergen commented Apr 27, 2015

eapache commented Apr 27, 2015