-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consumer timeouts #425
Comments
@ulutomaz We are also experiencing the same issue when using MSK with ssl enabled |
We've fixed a number of issues around how the consumers and their supervision works, so it wouldn't surprise me to learn there are more issues. That said, Kafka should not allow you to lose messages if you are not committing an offset until you fully process that offset. I would ensure this is not happening (often an issue is using something like Tasks or GenServer.cast, and allowing the handle_message_set function to return before ensuring all the messages are processed) I've also seen a number of issues where KafkaEx will detect timeouts where the kafka server doesn't log them. I'm frankly not sure why this happens, but increasing the |
@ulutomaz If you still have the issue, you can try an OTP upgrade #389 (comment) |
Encountering this issue
This appears to also be the culprit for sockets not being cleaned up and so the file descriptor count climbs eventually to the ulimit which then prevents more sockets from being opened and the program crashes. This is happening on relatively recent versions of elixir + erlang on version
Update: using |
Versions:
OS: Ubuntu 16.04 LTS
Erlang/OTP: 22.2.2
Elixir: 1.9.4
Kafka_ex: 0.11.0
Kafka cluster (AWS MSK): 2.2.1
Kafka topic (usual setup): 1 topic with 9 partitions
Overview:
We run our setup on AWS (managed Kafka - MSK) and our codebase lives on Linux based EC2. Our setup is distributed, where out of 3 nodes each takes 3 partitions. We run kafka_ex for a few years now, but lately, we experience this strange problem where we lose messages. We did some support / debugging with the AWS support team where on the side of managed Kafka (brokers, interconnection, etc) nothing seems to be wrong. No connection timeouts or anything can be found in logs.
Though on our side in logs we get errors like the one below (from time to time). Out of my head... the last error like this was about two weeks ago.
We did put some effort already into debugging but all things point to this due to the fact that missing messages somehow correlate with the time of this error.
Our codebase can handle this "interrupt" and runs further, but as said we experience message loss.
Error:
Other
Tasks
for handling Consumers within the codeThe text was updated successfully, but these errors were encountered: