-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
producer continually refreshing client metadata after broker disconnection #4577
Comments
Reverting to 2.2.0 has fixed the issue so I assume this is a regression introduced in 2.3.0 |
I can confirm this as well, combined with the very low value of |
Wouldn't increasing |
ehh... if by mitigate you mean "reduce the impact of the bug on the kafka cluster" - it doesn't actually solve the issue at hand. |
Yeah that's a viable work-around in the interim, but this behavior is clearly a bug and there is a lot of software out there that isn't gonna increase that value. It's really common for people to have thousands of Kafka clients/producers and one day they're going to upgrade and everything will work fine until one of their connections has an intermittent error in the middle of the night and their cluster gets melted :/ |
@jpiper sorry, this is what I meant. I am a maintainer of the ruby bindings and I usually do my best to ship intermediate fixes when stuff like this occurs. This is why I asked. I may consider changing the defaults for my lib until patch is released. // edit - I see default for librdkafka is 1 second (1000ms) so as long as not altered, it should be ok) |
I feel seen |
There is a problem in v2.3.0 when metadata is refreshed without leader changes that avoids that fast metadata refresh is stopped. We'll include this fix in next release. |
Description
We are producing into a topic with 84 partitions over 8 brokers. We are seeing the issue where a recoverable error condition (difficult to reproduce, but an example would be transport failure causing a broker connection to fail) will trigger the underlying librdkafka library to continually refresh the metadata for this topic at the
retry.backoff.max.ms
, even though the metadata request is successful and all partitions have leaders.e.g. here is the error condition happening which is triggered by a new connection to a broker being broken just ~8s after entering UP state
and here we can see that even though every one of these metadata requests is returning successfully (I've truncated the response for readability) and all the partitions have leaders, the client is still refreshing the metadata as if something were broken. Note that during this the producer is working fine and messages are getting delivered fine, but we have noticed on our servers that the number of metadata requests from our clients is huge.
It looks to me like there could be some sort of race condition around broker disconnects/reconnects and metadata refreshing?
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
2.3.0
3.4.0
message.max.bytes=25165824;socket.timeout.ms=10000;socket.keepalive.enable=true;debug=metadata,broker,topic
centos7
debug=..
as necessary) from librdkafkaThe text was updated successfully, but these errors were encountered: