Consumer in crashloop for persistance processor #212

cioboteacristian · 2019-12-09T13:18:12Z

I have a processor that has a persistence processor. It appears that it is not able to rebalance at all, and it causes my pods to be in a crash loop.

My topic used for persistence (-table) has cleanup.policy=compact. Apparently, we have 1.3 mil messages in this topic.

I've even tried to reduce the number of pods to 1, to see if there is some kind of concurrency problem, it staled with the last log 2019/12/09 13:33:23 Processor: dispatcher started for about 5 minutes (raising the memory usage to 5GB), and eventually started (no extra logging or something)

I have 4 pods, and the topic has 4 partitions. I have 3 processors (2 of them without the persistence, and they work fine). It looks like the processor is not able to rebalance. The input is about 25msg/s.

I believe the root issue is the same as already mentioned in other posts here, slow recovery of the huge -table. However, I am surprised by this constant rebalancing.

Logs before dying:

2019/12/09 12:23:38 Processor: starting
2019/12/09 12:23:38 Processor: starting
2019/12/09 12:23:38 view: starting
2019/12/09 12:23:38 Processor: starting
2019/12/09 12:23:38 Processor: creating consumer [consumer1]
2019/12/09 12:23:38 Processor: creating consumer [consumer2]
2019/12/09 12:23:38 Processor: creating consumer [consumer3]
2019/12/09 12:23:38 Processor: creating producer
2019/12/09 12:23:38 Processor: creating producer
2019/12/09 12:23:38 view: partition 3 started
2019/12/09 12:23:38 view: partition 1 started
2019/12/09 12:23:38 view: partition 0 started
2019/12/09 12:23:38 view: partition 2 started
2019/12/09 12:23:38 Processor: creating producer
2019/12/09 12:23:38 Processor: rebalancing: map[]
2019/12/09 12:23:38 Processor: dispatcher started
2019/12/09 12:23:38 Processor: rebalancing: map[]
2019/12/09 12:23:38 Processor: dispatcher started
2019/12/09 12:23:38 Processor: rebalancing: map[]
2019/12/09 12:23:38 Processor: dispatcher started
2019/12/09 12:23:41 Processor: dispatcher stopped
2019/12/09 12:23:41 Processor: rebalancing: map[1:-1 2:-1]
2019/12/09 12:23:41 Processor: dispatcher started
2019/12/09 12:23:42 Processor: dispatcher stopped
2019/12/09 12:23:42 Processor: rebalancing: map[3:-1]
2019/12/09 12:23:42 Processor: dispatcher started
2019/12/09 12:23:52 Processor: dispatcher stopped
2019/12/09 12:23:52 partition /3: exit
2019/12/09 12:23:52 Removing partition 3
2019/12/09 12:23:52 Processor: rebalancing: map[3:-1]
2019/12/09 12:23:52 Processor: dispatcher started
2019/12/09 12:23:54 Processor: dispatcher stopped
2019/12/09 12:23:54 partition /2: exit
2019/12/09 12:23:54 partition /1: exit
2019/12/09 12:23:54 Removing partition 1
2019/12/09 12:23:54 Removing partition 2
2019/12/09 12:23:54 Processor: rebalancing: map[1:-1]
2019/12/09 12:23:54 Processor: dispatcher started
2019/12/09 12:24:08 Processor: dispatcher stopped
2019/12/09 12:24:08 Processor: removing partitions
2019/12/09 12:24:08 Processor: closing producer
2019/12/09 12:24:08 Processor: closing consumer [consumer3]
2019/12/09 12:24:08 Processor: stopped
{"lvl":"info","host":"uservice-564dcbbbcb-9g8mv","msg":"component error received"}
{"lvl":"info","host":"uservice-564dcbbbcb-9g8mv","msg":"shutting down component"}
2019/12/09 12:24:08 Processor: dispatcher stopped
2019/12/09 12:24:08 partition /1: exit
2019/12/09 12:24:08 Processor: dispatcher stopped
2019/12/09 12:24:08 view: partition 2 stopped
2019/12/09 12:24:08 Removing partition 1
2019/12/09 12:24:08 Processor: removing partitions
2019/12/09 12:24:08 Processor: closing producer
2019/12/09 12:24:08 view: partition 3 stopped
2019/12/09 12:24:08 view: partition 1 stopped
2019/12/09 12:24:08 Processor: closing consumer [consumer2]
2019/12/09 12:24:08 view: partition 0 stopped
2019/12/09 12:24:08 view: closing consumer
2019/12/09 12:24:08 partition /3: exit
2019/12/09 12:24:08 Removing partition 3
2019/12/09 12:24:08 Processor: removing partitions
2019/12/09 12:24:08 Processor: closing producer
2019/12/09 12:24:08 Processor: closing consumer [consumer1]
2019/12/09 12:24:08 Processor: stopped
2019/12/09 12:24:08 Processor: stopped

The text was updated successfully, but these errors were encountered:

frairon · 2020-03-22T12:28:03Z

Hi @cioboteacristian, sorry for the late reponse. Your issue looks like the component is terminated by the container. The high memory usage could be a reason or the long startup time before it actually starts consuming.
The huge memory consumption is an issue we have to improve at some point but the recovery speed is actually limited to the network bandwidth and disk speed.
Anyway, as announced in #239 we are working on a refactored and improved version for goka. Although the recovery mechanism stayed the same more or less, it would be interesting if you are still facing those issues. Just use branch consumer-group or vendor tag v0.9.0-beta2 to try it out.
Let me know if there are any results or you have any problems.
Cheers!

frairon · 2020-07-10T13:35:52Z

Since there hasn't been any activity on this, I'll close it. If you're still having the same issue, feel free to reopen or create a new issue.

frairon added the help wanted label Mar 22, 2020

frairon closed this as completed Jul 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consumer in crashloop for persistance processor #212

Consumer in crashloop for persistance processor #212

cioboteacristian commented Dec 9, 2019 •

edited

Loading

frairon commented Mar 22, 2020

frairon commented Jul 10, 2020

Consumer in crashloop for persistance processor #212

Consumer in crashloop for persistance processor #212

Comments

cioboteacristian commented Dec 9, 2019 • edited Loading

frairon commented Mar 22, 2020

frairon commented Jul 10, 2020

cioboteacristian commented Dec 9, 2019 •

edited

Loading