-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rd_kafka_buf_destroy() always aborts - causing an immediate crash of the consumer app #353
Comments
That doesn't sound good.
|
|
Also have large helgrind output that I can send (won't fit in comment window) |
Can you mail me the helgrind output to [email protected]? |
When it crashes does glibc give you any information to why it crashes? |
*** glibc detected *** /tmp/pirate_debug: double free or corruption (!prev): 0x00007ffee80239a0 *** |
Ah, thanks. Which rd_kafka_consume*() API are you using to retrieve messages? |
ssize_t message_count = rd_kafka_consume_batch(rkt, partition, timeout_millis, messages, max_messages); |
Okay, and where does it crash? |
Here is the processing loop - without the uninteresting guts in the middle: for (ssize_t a = 0; a < messages_read; a++) {
} |
And yes, it seems to be when calling rd_kafka_message_destroy() |
Looks good. |
No compression. |
Out of curiosity, what if you do either of these, does the problem persist?:
|
Also, if you try consuming the same topic+partition and offset with kafkacat (or rdkafka_example), does that crash too? |
I haven't tried any other clients (what is kafkacat?) I can try it. |
kafkacat is a generic consumer&producer built on top of librdkafka, so it would utilize the same library code as your program: Yes, please try the modifications. |
Took out the code in the middle (now just consume batch, then free messages). But I also know that when I run valgrind, problem never occurs. So, not sure if the problem was the |
Is it worth taking the time to change the batch consume to individual? (and putting the processing code back in) |
Interesting. |
FYI when running valgrind with mem tool, everything comes up very clean. |
Let me double check |
Try adding a usleep(500000); in your processing block (instead of your ordinary code) to see if you can trigger the race condition, if any. |
Definitely not freeing the payload, let me try the timer |
Ok, I think the problem is solved. Here again is the processing loop (with slightly more detail). Aside from the fact that I should be writing a 0 at [len-1] instead of [len], is there any other for (ssize_t a = 0; a < messages_read; a++) {
|
Good catch. There is currently no problem doing so, but I cannot guarantee that it will be future proof. |
Client uses latest from rdkafka C from master branch, connecting to Kafka 0.8.2.
OS is Centos 6.6.
As soon as data is put into Kafka, the library consumes it and immediately crashes.
Unfortunately when run within valgrind, with both the memtool and helgrind, the problem does
not re-occur. When run directly, occurs every time, and immediately.
Also, the consumer ran perfectly for a couple of weeks of heavy testing, and suddenly stopped
working. My suspicion is that a configuration change (or some other change) in Kafka and/or ZK
may be triggering a new code path in the consumer/library - although this is just intuition. There
were no changes/upgrades to Kafka or ZK (in short, not sure what changed).
Here is the stack trace from gdb:
#0 0x00007f50f6fd8625 in raise () from /lib64/libc.so.6
#1 0x00007f50f6fd9e05 in abort () from /lib64/libc.so.6
#2 0x00007f50f7016537 in __libc_message () from /lib64/libc.so.6
#3 0x00007f50f701be66 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007f50f701e9b3 in _int_free () from /lib64/libc.so.6
#5 0x0000000000444ea1 in rd_kafka_buf_destroy (rkbuf=0x7f50e800b030) at rdkafka_broker.c:166
#6 0x000000000043f8c5 in rd_kafka_op_destroy (rko=0x7f50e800b190) at rdkafka.c:198
#7 0x000000000040a6bf in run_kafka_thread (arg=0x2673a70) at src/run_kafka_thr.c:179
#8 0x00007f50f79e39d1 in start_thread () from /lib64/libpthread.so.0
#9 0x00007f50f708e8fd in clone () from /lib64/libc.so.6
The text was updated successfully, but these errors were encountered: