-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ThreadSanitizer: data race + hang in rd_kafka_destroy (or rd_kafka_destroy_flags) #4811
Comments
Additionally have the following log of refcounts:
But I am not very good aware of librdkafka internals to understand which refcounts were not decremented. |
Hi @blindspotbounty I think that data race doesn't have to do with the hang on destroy, as both threads are accessing the For the hang on destroy, especially if you're removing a topic, try applying this PR |
Read the FAQ first: https://github.com/confluentinc/librdkafka/wiki/FAQ
Do NOT create issues for questions, use the discussion forum: https://github.com/confluentinc/librdkafka/discussions
Description
I was investigating hang on destroy that specifically reproduces on x86 platform (cannot reproduce with Mac OS/Linux arch64/M2).
How to reproduce
But on x86, it is good reproducible with the following scenario:
Better reproducible with rebalance enabled and assign called from different thread, i.e.:
The only thing that I've found that is pretty good reproducible on x86 platform is the following race (which I don't see on arm):
It is better reproducible with v2.3.0 and less with v2.5.0. With 2.3.0 I can reproduce it without external rebalance and with 2.5.0 with custom rebalance it is much better reproducible.
Reproducing this scenario pretty good with swift wrapper.
There is a code in swift that can catch this race:
As the only difference between platforms is this race catched by TSan, I suspect that this might be a problem for future client destroy method.
Remark: I was trying to use kafka_destroy and kafka_destroy_flags with RD_KAFKA_DESTROY_F_NO_CONSUMER_CLOSE but works the same.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
debug=..
as necessary) from librdkafkaLast logs before hang is
While rebalance assign(NULL) is called further, it is ignored by librdkafka and seems it is the reason for hanging
The text was updated successfully, but these errors were encountered: