-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node-rdkafka e2e tests failing #3904
Comments
@robinfehr I don't think this is only an issue with linux as it also fails and gets stuck when running in k8s which uses a linux container. Its strange that the tests only fail on mac. I'm also using a mac and tested with my app running in local k8s cluster on my mac and also a dev k8s cluster running on a linux machine. Both resulted in a stuck producer state. |
@robinfehr I've been trying to test the code you provided and it seems to be getting stuck at "producing now test key" which is line 130. I'm running the code against a local kafka broker running in docker. |
@o2themar that is weird since that works for me, only gets into the loop when I try to disconnect - just to make sure, do you have the topic created? |
@robinfehr Yes I have created the topic 'test' and even verified it was there. |
Ok that was strange when I restarted the container after creating the topic and reran the tests it started working. For some reason it was not finding the topic and the reboot of the container got it working. |
@robinfehr So I was able to replicate the issue. I had to run the script in a loop in order to get it to fail and get stuck. It takes less than 20 random runs to get it in the state. It wasn't failing when I was manually running it. Sometimes it passes all 20 runs. Success has these two while failure doesn't: The above is different from the first call which does this: |
Some more findings. I found the loop where it was printing the logs. It seems to be stuck in this loop and it cannot get out. |
exactly that was the loop was referring to in the additional info section above 👍 |
Yes I am on a mac and I was able to reproduce your issue. I noticed when I added some debug lines it got harder to reproduce it. I'm wondering if its a timing issue that is causing it to get in the stuck state? |
The repeating When this happens it is typically because the application/bindings have not followed the termination sequence and is still holding some object, it could be an rd_kafka_message_t, rd_kafka_topic_t, rd_kafka_topic_partition_list_t, etc. |
while this hasn't been fixed in node-rdkafka yet for the standard consumer, it is related to the reblance_cb which gets invoked after the disconnect is initiated. if this rebalance that revokes the partitions doesn't get processed properly, the disconnect will hang infidelity. @edenhill do you think I should create a PR here to improve the docs that this REBALANCE must be processed or is that even a bug that this doesn't time out? (I would have to read the specs) Closing this ticket for now since it is solved for my case and nothing has to be changed in librdkafka. |
Description
I have implemented the cooporative rebalance in node-rdkafka and when i first started, the e2e tests were working.
Soon I realized that they don't alway run trough. Sporadically they fail.
I've now extracted a test that can be run directly with node (without mocha) but tape.
The test sometimes fails, sometimes runs trough.
The behaviour is inconsistent.
What seems to trigger it the CLOSE of the consumer.
Since I'm quite new the internals of the librdkafka lib - some help would be appreciated.
Meanwhile, I'll try to dig and see if i can find more.
How to reproduce
this is the test can be added to the root of node-rdkafka and can be run simply with
node filename.js
Debug-Logs in the working case
the callback of the close fn gets called.
see
ok 13 no error after disconnecting
the
Debug-Logs in the non-working case
the callback of the close fn doesn't get called.
no
ok 13 no error after disconnecting
Additional Info
After having debugged for a while i've found that the issue occurs after this
delete
:https://github.com/Blizzard/node-rdkafka/blob/master/src/kafka-consumer.cc#L88.
it then gets stuck in the loop in librdkafka.
if i remove the
delete
the test all work fine.i cannot reproduce the issue on linux, only on macos - monterey 12.3.1 (21E258)
the kafka broker logs looked fine - this is not kafka related.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
v1.9.1
[<REPLACE with e.g., 0.10.2.3>](https://hub.docker.com/layers/cp-kafka/confluentinc/cp-kafka/latest/images/sha256-6a9da0b9cbba850cf95b58b6d958518b3b5fbe788c262c7a1763acf7faf2af79?context=explore)
macos - monterey 12.3.1 (21E258)
debug=..
as necessary) from librdkafka-
tbd
The text was updated successfully, but these errors were encountered: