Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

program crashed for rd_kafka_tppar_t was used after destroy #1846

Closed
1 of 7 tasks
TomGitter opened this issue Jun 14, 2018 · 12 comments
Closed
1 of 7 tasks

program crashed for rd_kafka_tppar_t was used after destroy #1846

TomGitter opened this issue Jun 14, 2018 · 12 comments

Comments

@TomGitter
Copy link

Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ

Description

How to reproduce

<your steps how to reproduce goes here, or remove section if not relevant>

IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/edenhill/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • librdkafka version (release number or git tag): <REPLACE with e.g., v0.10.5 or a git sha. NOT "latest" or "current">
  • Apache Kafka version: <REPLACE with e.g., 0.10.2.3>
  • librdkafka client configuration: <REPLACE with e.g., message.timeout.ms=123, auto.reset.offset=earliest, ..>
  • Operating system: <REPLACE with e.g., Centos 5 (x64)>
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue
@TomGitter
Copy link
Author

When I use librdkafka, I meet a problam, the program crashed,I find that, a rktp(rd_kafka_toppar_t) was used,but its memory was destroyed, I guess there is something wrong with its refcnt

@edenhill
Copy link
Contributor

There's not much to go on here.
Can you provide a backtrace from gdb?

@TomGitter
Copy link
Author

TomGitter commented Jun 20, 2018

I use the librdkafka version is 0.11.4(the lastest),I just use the program rdkafka_consumer_example_cpp which is in the giving example,change the config:
set "fetch.message.max.bytes" "10000000"
set "receive.message.max.bytes" "1000000000"
set "message.max.bytes" "10000000"
set "auto.offset.reset" "earliest"
my kafka server is full load,so librdkafka is always always timed out,and disconnects
the backtrace is here, There is many situations, I just list two of them,run in linux:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffef5fe700 (LWP 93763)]
0x0000000000427a43 in rd_kafka_broker_toppars_serve (rkb=0x7fffe8001f80) at  	rdkafka_broker.c:2316
2316	        TAILQ_FOREACH_SAFE(rktp, &rkb->rkb_toppars, rktp_rkblink, rktp_tmp) {
(gdb) bt
#0  0x0000000000427a43 in rd_kafka_broker_toppars_serve (rkb=0x7fffe8001f80) at  	 	rdkafka_broker.c:2316
#1  rd_kafka_broker_consumer_serve (rkb=0x7fffe8001f80) at rdkafka_broker.c:3168
#2  0x00000000004286ef in rd_kafka_broker_thread_main (arg=0x7fffe8001f80) at rdkafka_broker.c:3311
#3  0x000000000045f6ef in _thrd_wrapper_function (aArg=<optimized out>) at tinycthread.c:624
#4  0x00000032b4207a51 in start_thread () from /lib64/libpthread.so.0
#5  0x00000032b3ee896d in clone () from /lib64/libc.so.6 

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7184700 (LWP 127560)]
0x00000032b3e32625 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00000032b3e32625 in raise () from /lib64/libc.so.6
#1  0x00000032b3e33e05 in abort () from /lib64/libc.so.6
#2  0x000000000041273b in rd_kafka_crash (file=<optimized out>, line=<optimized out>, function=<optimized out>, rk=0x0, reason=<optimized out>) at rdkafka.c:3115
#3  0x0000000000456f4b in rd_kafka_q_destroy0 (disable=<optimized out>, rkq=<optimized out>) at rdkafka_queue.h:188
#4  rd_kafka_replyq_destroy (replyq=<optimized out>) at rdkafka_queue.h:653
#5  rd_kafka_toppar_destroy_final (rktp=0x7fffe8006ab0) at rdkafka_partition.c:276
#6  0x000000000042cd9e in rd_kafka_topic_scan_all (rk=0x6eb9e0, now=1315870276259) at rdkafka_topic.c:1140
#7  0x00000000004112f7 in rd_kafka_topic_scan_tmr_cb (rkts=<optimized out>, arg=<optimized out>) at rdkafka.c:1194
#8  0x0000000000431009 in rd_kafka_timers_run (rkts=0x6ec200, timeout_us=0) at rdkafka_timer.c:251
#9  0x000000000041c30f in rd_kafka_thread_main (arg=0x6eb9e0) at rdkafka.c:1270
#10 0x000000000045f6ef in _thrd_wrapper_function (aArg=<optimized out>) at tinycthread.c:624
#11 0x00000032b4207a51 in start_thread () from /lib64/libpthread.so.0
#12 0x00000032b3ee896d in clone () from /lib64/libc.so.6

I find the partition was destroyed begin using,I list backtrace of two situations the partition was destroyed,run in windows:

librdkafka.dll!rd_kafka_toppar_destroy_final(rd_kafka_toppar_s * rktp=0x0000000002b440b0) 行 265	C
librdkafka.dll!rd_kafka_topic_partition_destroy0(rd_kafka_topic_partition_s * rktpar=0x00000000033401b0, int do_free=0) 行 2500	C
librdkafka.dll!rd_kafka_topic_partition_list_destroy(rd_kafka_topic_partition_list_s * rktparlist=0x00000000030e7600) 行 2521	C
librdkafka.dll!rd_kafka_op_destroy(rd_kafka_op_s * rko=0x00000000004e0c80) 行 222	C
librdkafka.dll!rd_kafka_cgrp_op_handle_OffsetCommit(rd_kafka_s * rk=0x000000000046a380, rd_kafka_broker_s * rkb=0x0000000000000000, rd_kafka_resp_err_t err=RD_KAFKA_RESP_ERR__WAIT_COORD, rd_kafka_buf_s * rkbuf=0x0000000000000000, rd_kafka_buf_s * request=0x0000000000000000, void * opaque=0x00000000004e0c80) 行 1967	C
librdkafka.dll!rd_kafka_cgrp_timeout_scan(rd_kafka_cgrp_s * rkcg=0x0000000000455020, __int64 now=41682577000) 行 2694	C
librdkafka.dll!rd_kafka_cgrp_serve(rd_kafka_cgrp_s * rkcg=0x0000000000455020) 行 3071	C
librdkafka.dll!rd_kafka_thread_main(void * arg=0x000000000046a380) 行 1270	C
librdkafka.dll!_thrd_wrapper_function(void * aArg=0x000000000044ac30) 行 624	C

librdkafka.dll!rd_kafka_toppar_destroy_final(rd_kafka_toppar_s * rktp=0x000000000040df50) 行 265	C
librdkafka.dll!rd_kafka_toppar_handle_Offset(rd_kafka_s * rk=0x000000000039a380, rd_kafka_broker_s * rkb=0x00000000003dc560, rd_kafka_resp_err_t err=RD_KAFKA_RESP_ERR__OUTDATED, rd_kafka_buf_s * rkbuf=0x00000000022a79f0, rd_kafka_buf_s * request=0x0000000003148ca0, void * opaque=0x000000000040df50) 行 1289	C
librdkafka.dll!rd_kafka_buf_callback(rd_kafka_s * rk=0x000000000039a380, rd_kafka_broker_s * rkb=0x00000000003dc560, rd_kafka_resp_err_t err=RD_KAFKA_RESP_ERR_NO_ERROR, rd_kafka_buf_s * response=0x00000000022a79f0, rd_kafka_buf_s * request=0x0000000003148ca0) 行 447	C
librdkafka.dll!rd_kafka_buf_handle_op(rd_kafka_op_s * rko=0x0000000002f7ce20, rd_kafka_resp_err_t err=RD_KAFKA_RESP_ERR_NO_ERROR) 行 396	C
librdkafka.dll!rd_kafka_op_handle_std(rd_kafka_s * rk=0x000000000039a380, rd_kafka_q_s * rkq=0x00000000005ff420, rd_kafka_op_s * rko=0x0000000002f7ce20, int cb_type=1) 行 588	C
librdkafka.dll!rd_kafka_op_handle(rd_kafka_s * rk=0x000000000039a380, rd_kafka_q_s * rkq=0x00000000005ff420, rd_kafka_op_s * rko=0x0000000002f7ce20, rd_kafka_q_cb_type_t cb_type=RD_KAFKA_Q_CB_CALLBACK, void * opaque=0x0000000000000000, rd_kafka_op_res_t (rd_kafka_s *, rd_kafka_q_s *, rd_kafka_op_s *, rd_kafka_q_cb_type_t, void *) * callback=0x0000000000000000) 行 619	C
librdkafka.dll!rd_kafka_q_serve(rd_kafka_q_s * rkq=0x0000000000380590, int timeout_ms=0, int max_cnt=0, rd_kafka_q_cb_type_t cb_type=RD_KAFKA_Q_CB_CALLBACK, rd_kafka_op_res_t (rd_kafka_s *, rd_kafka_q_s *, rd_kafka_op_s *, rd_kafka_q_cb_type_t, void *) * callback=0x0000000000000000, void * opaque=0x0000000000000000) 行 474	C
librdkafka.dll!rd_kafka_thread_main(void * arg=0x000000000039a380) 行 1268	C
librdkafka.dll!_thrd_wrapper_function(void * aArg=0x000000000037ac30) 行 624	C

I check the code,but can't find the reason for destroying

@edenhill
Copy link
Contributor

Thank you, this is great!
Do you have any (client) logs from such crashes? Maybe even debug? (debug=topic,metadata,cgrp)

@TomGitter
Copy link
Author

OK,I had send logs to your email([email protected]),please check

@edenhill
Copy link
Contributor

Thanks.

I guess the program crashed at the end of this log file?
What was the backtrace?

Did you manually try to terminate the program before this crash? (e.g., ctrl-c)?

@TomGitter
Copy link
Author

I just run the program normally, I find that, the reason of problem is that I create too many partitions of the topic(320 partitions),when I create 8 partitions of the topic, the program will run OK;I think if you change your partitions to 320 or more, and consume data from the topic when producing data to it, maybe you will see the problem

@edenhill
Copy link
Contributor

@TomGitter What librdkafka version are you using? What Operating system and version?

@TomGitter
Copy link
Author

I use the librdkafka version is 0.11.4(the lastest),and kafka version is 0.10.1.0. I can meet the problem with windows 7 flagship and redhat enterprise linux server releae 6.7

@yexinxing
Copy link

@edenhill , we have meet the similar issue in version 0.11.4 and 0.11.6, we get the logs and gdb stacks. It seems that the rktp=0x7f342c08c870 have destroyed 2 times, so the rkq_refcnt is 0 in second destroy operations. I have checked the code but did not find the root case. Can you help check?
Thread 1 (Thread 0x7f339ba01700 (LWP 7679)):
#0 0x00007f346701c417 in raise () from /lib64/libc.so.6
#1 0x00007f346701db08 in abort () from /lib64/libc.so.6
#2 0x00007f344542b01d in rd_kafka_crash (file=0x7f34454e773d "rdkafka_queue.h", line=188, function=0x7f34454e8970 <FUNCTION.18935> "rd_kafka_q_destroy0", rk=0x0,
reason=0x7f34454e7721 "assert: rkq->rkq_refcnt > 0") at rdkafka.c:3137
#3 0x00007f344547de5e in rd_kafka_q_destroy0 (rkq=0x7f342c08cb90, disable=1) at rdkafka_queue.h:188
#4 0x00007f344547decb in rd_kafka_q_destroy_owner (rkq=0x7f342c08cb90) at rdkafka_queue.h:210
#5 0x00007f344547f4ff in rd_kafka_toppar_destroy_final (rktp=0x7f342c08c870) at rdkafka_partition.c:273
#6 0x00007f34454482ca in rd_kafka_toppar_leader_update2 (rkt=0x7f342c08c1f0, partition=0, leader_id=20001, rkb=0x7f33b432adb0) at rdkafka_topic.c:490
#7 0x00007f344544920c in rd_kafka_topic_metadata_update (rkt=0x7f342c08c1f0, mdt=0x7f342c079fc8, ts_age=177767170550) at rdkafka_topic.c:850
#8 0x00007f3445449510 in rd_kafka_topic_metadata_update2 (rkb=0x7f33b432adb0, mdt=0x7f342c079fc8) at rdkafka_topic.c:930
#9 0x00007f344549a473 in rd_kafka_parse_Metadata (rkb=0x7f33b432adb0, request=0x7f342c180030, rkbuf=0x7f34349f1d10, mdp=0x7f339ba00bb0) at rdkafka_metadata.c:480
#10 0x00007f34454682ad in rd_kafka_handle_Metadata (rk=0x7f33b41ec2f0, rkb=0x7f33b432adb0, err=RD_KAFKA_RESP_ERR_NO_ERROR, rkbuf=0x7f34349f1d10, request=0x7f342c180030, opaque=0x0)
at rdkafka_request.c:1365
#11 0x00007f344545939c in rd_kafka_buf_callback (rk=0x7f33b41ec2f0, rkb=0x7f33b432adb0, err=RD_KAFKA_RESP_ERR_NO_ERROR, response=0x7f34349f1d10, request=0x7f342c180030)
at rdkafka_buf.c:444
#12 0x00007f3445459230 in rd_kafka_buf_handle_op (rko=0x7f3434711a70, err=RD_KAFKA_RESP_ERR_NO_ERROR) at rdkafka_buf.c:393
#13 0x00007f344545e840 in rd_kafka_op_handle_std (rk=0x7f33b41ec2f0, rkq=0x7f339ba00d30, rko=0x7f3434711a70, cb_type=1) at rdkafka_op.c:588
#14 0x00007f344545e8aa in rd_kafka_op_handle (rk=0x7f33b41ec2f0, rkq=0x7f339ba00d30, rko=0x7f3434711a70, cb_type=RD_KAFKA_Q_CB_CALLBACK, opaque=0x0, callback=0x0) at rdkafka_op.c:619
#15 0x00007f344545b8ae in rd_kafka_q_serve (rkq=0x7f33b432c290, timeout_ms=0, max_cnt=0, cb_type=RD_KAFKA_Q_CB_CALLBACK, callback=0x0, opaque=0x0) at rdkafka_queue.c:474
#16 0x00007f3445427277 in rd_kafka_thread_main (arg=0x7f33b41ec2f0) at rdkafka.c:1266
---Type to continue, or q to quit---
#17 0x00007f344548f60e in _thrd_wrapper_function (aArg=0x7f33d0354ba0) at tinycthread.c:624
#18 0x00007f3466bbce45 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f34670dfafd in clone () from /lib64/libc.so.6

E001:3-8 11:19:58.149(296302|7679)RDKAFKA-7-REQERR: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: MetadataRequest failed: Local: Timed out: actions Retry
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: ===== Received metadata (for 2 requested topics): connected =====
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: ClusterId: FpA_LwT3Tj2ypkgdIJEtBw, ControllerId: 20001
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: 1 brokers, 2 topics
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: Broker #0/1: 10.249.92.197:26328 NodeId 20001
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: Topic #0/2: ReliableEventPipe with 5 partitions
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-STATE: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventPipe changed state unknown -> exists
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventPipe partition 2 Leader 20001
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventPipe partition 4 Leader 20001
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventPipe partition 1 Leader 20001
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventPipe partition 3 Leader 20001
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventPipe partition 0 Leader 20001
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: Topic #1/2: ReliableEventRetryPipe with 3 partitions
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-STATE: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventRetryPipe changed state unknown -> exists
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventRetryPipe partition 2 Leader 20001
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventRetryPipe partition 1 Leader 20001
E001:3-8 11:19:58.436(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventRetryPipe partition 0 Leader 20001
E001:3-8 11:19:58.437(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: 2/2 requested topic(s) seen in metadata
E001:3-8 11:19:58.509(296302|7679)RDKAFKA-7-OFFSET: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: Offset reply for topic ReliableEventRetryPipe [0] (v2 vs v5)
E001:3-8 11:19:58.509(296302|7679)RDKAFKA-7-OFFSET: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: Offset reply error for topic ReliableEventRetryPipe [0] (v2): Local: Outdated
E001:3-8 11:19:58.509(296302|7679)RDKAFKA-7-TOPPARREMOVE: rdkafka#consumer-3: [thrd:main]: Removing toppar ReliableEventRetryPipe [0] 0x7f342c08c870
E001:3-8 11:19:58.509(296302|7679)RDKAFKA-7-DESTROY: rdkafka#consumer-3: [thrd:main]: ReliableEventRetryPipe [0]: 0x7f342c08c870 DESTROY_FINAL
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: ===== Received metadata (for 2 requested topics): refresh unavailable topics =====
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: ClusterId: FpA_LwT3Tj2ypkgdIJEtBw, ControllerId: 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: 1 brokers, 2 topics
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: Broker #0/1: 10.249.92.197:26328 NodeId 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: Topic #0/2: ReliableEventPipe with 5 partitions
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventPipe partition 2 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventPipe partition 4 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventPipe partition 1 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventPipe partition 3 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventPipe partition 0 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: Topic #1/2: ReliableEventRetryPipe with 3 partitions
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventRetryPipe partition 2 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventRetryPipe partition 1 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: Topic ReliableEventRetryPipe partition 0 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-TOPPARREMOVE: rdkafka#consumer-3: [thrd:main]: Removing toppar ReliableEventRetryPipe [0] 0x7f342c08c870
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-DESTROY: rdkafka#consumer-3: [thrd:main]: ReliableEventRetryPipe [0]: 0x7f342c08c870 DESTROY_FINAL

@edenhill
Copy link
Contributor

Is this somewhat easily reproducible for you?

There's been some fixes to partition management on master for the upcoming v1.0.0 release.
Would be great if you could try out a master build of librdkafka, or v1.0.0-RC7.

Otherwise, make sure your rd_kafka_topic_new() / rd_kafka_topic_destroy() calls are symetric, exactly one destroy per new.

@yexinxing
Copy link

@edenhill , thanks for your reply.

Yes, it's easily reproduce in our performance test environment.

In my application, call rd_kafka_topic_new at process starting and call rd_kafka_topic_destroy when process exit, so i think it's ok.

And i have learned the code these days. From the logs, i saw that when the operation version is outdated, the rd_kafka_toppar_t object 0x7f342c08c870 will be destroy. the call stack maybe like this:

#0  rd_kafka_toppar_handle_Offset (rk=0x43070e0, rkb=0x4311230, err=RD_KAFKA_RESP_ERR_NO_ERROR, rkbuf=0x4869780, request=0x48a0bc0, opaque=0x48ce450) at rdkafka_partition.c:1219
#1  0x00007fb03046139c in rd_kafka_buf_callback (rk=0x43070e0, rkb=0x4311230, err=RD_KAFKA_RESP_ERR_NO_ERROR, response=0x4869780, request=0x48a0bc0) at rdkafka_buf.c:444
#2  0x00007fb030461230 in rd_kafka_buf_handle_op (rko=0x4805500, err=RD_KAFKA_RESP_ERR_NO_ERROR) at rdkafka_buf.c:393
#3  0x00007fb030466840 in rd_kafka_op_handle_std (rk=0x43070e0, rkq=0x7faff0303d30, rko=0x4805500, cb_type=1) at rdkafka_op.c:588
#4  0x00007fb0304668aa in rd_kafka_op_handle (rk=0x43070e0, rkq=0x7faff0303d30, rko=0x4805500, cb_type=RD_KAFKA_Q_CB_CALLBACK, opaque=0x0, callback=0x0) at rdkafka_op.c:619
#5  0x00007fb0304638ae in rd_kafka_q_serve (rkq=0x4307a70, timeout_ms=999, max_cnt=0, cb_type=RD_KAFKA_Q_CB_CALLBACK, callback=0x0, opaque=0x0) at rdkafka_queue.c:474
#6  0x00007fb03042f277 in rd_kafka_thread_main (arg=0x43070e0) at rdkafka.c:1266
#7  0x00007fb03049760e in _thrd_wrapper_function (aArg=0x40c30e0) at tinycthread.c:624
#8  0x00007fb04c6dfe45 in start_thread () from /lib64/libpthread.so.0
#9  0x00007fb04cc02afd in clone () from /lib64/libc.so.
E001:3-8 11:19:58.509(296302|7679)RDKAFKA-7-OFFSET: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: Offset reply for topic ReliableEventRetryPipe [0] (**v2 vs v5**) 
E001:3-8 11:19:58.509(296302|7679)RDKAFKA-7-OFFSET: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: Offset reply error for topic ReliableEventRetryPipe [0] (v2): Local: Outdated 
E001:3-8 11:19:58.509(296302|7679)RDKAFKA-7-TOPPARREMOVE: rdkafka#consumer-3: [thrd:main]: Removing toppar ReliableEventRetryPipe [0] 0x7f342c08c870
E001:3-8 11:19:58.509(296302|7679)RDKAFKA-7-DESTROY: rdkafka#consumer-3: [thrd:main]: ReliableEventRetryPipe [0]: 0x7f342c08c870 DESTROY_FINAL

Almost the same time, in the same thread(i append process id and thread id to the logs, like 296302|7679), the same rd_kafka_toppar_t object 0x7f342c08c870 had been destroy the second time and crashed. The stack is below:

Thread 1 (Thread 0x7f339ba01700 (LWP 7679)):
#0  0x00007f346701c417 in raise () from /lib64/libc.so.6
#1  0x00007f346701db08 in abort () from /lib64/libc.so.6
#2  0x00007f344542b01d in rd_kafka_crash (file=0x7f34454e773d "rdkafka_queue.h", line=188, function=0x7f34454e8970 <__FUNCTION__.18935> "rd_kafka_q_destroy0", rk=0x0, 
    reason=0x7f34454e7721 "assert: rkq->rkq_refcnt > 0") at rdkafka.c:3137
#3  0x00007f344547de5e in rd_kafka_q_destroy0 (rkq=0x7f342c08cb90, disable=1) at rdkafka_queue.h:188
#4  0x00007f344547decb in rd_kafka_q_destroy_owner (rkq=0x7f342c08cb90) at rdkafka_queue.h:210
#5  0x00007f344547f4ff in rd_kafka_toppar_destroy_final (rktp=0x7f342c08c870) at rdkafka_partition.c:273
#6  0x00007f34454482ca in rd_kafka_toppar_leader_update2 (rkt=0x7f342c08c1f0, partition=0, leader_id=20001, rkb=0x7f33b432adb0) at rdkafka_topic.c:490
#7  0x00007f344544920c in rd_kafka_topic_metadata_update (rkt=0x7f342c08c1f0, mdt=0x7f342c079fc8, ts_age=177767170550) at rdkafka_topic.c:850
#8  0x00007f3445449510 in rd_kafka_topic_metadata_update2 (rkb=0x7f33b432adb0, mdt=0x7f342c079fc8) at rdkafka_topic.c:930
#9  0x00007f344549a473 in rd_kafka_parse_Metadata (rkb=0x7f33b432adb0, request=0x7f342c180030, rkbuf=0x7f34349f1d10, mdp=0x7f339ba00bb0) at rdkafka_metadata.c:480
#10 0x00007f34454682ad in rd_kafka_handle_Metadata (rk=0x7f33b41ec2f0, rkb=0x7f33b432adb0, err=RD_KAFKA_RESP_ERR_NO_ERROR, rkbuf=0x7f34349f1d10, request=0x7f342c180030, opaque=0x0)
    at rdkafka_request.c:1365
#11 0x00007f344545939c in rd_kafka_buf_callback (rk=0x7f33b41ec2f0, rkb=0x7f33b432adb0, err=RD_KAFKA_RESP_ERR_NO_ERROR, response=0x7f34349f1d10, request=0x7f342c180030)
    at rdkafka_buf.c:444
#12 0x00007f3445459230 in rd_kafka_buf_handle_op (rko=0x7f3434711a70, err=RD_KAFKA_RESP_ERR_NO_ERROR) at rdkafka_buf.c:393
#13 0x00007f344545e840 in rd_kafka_op_handle_std (rk=0x7f33b41ec2f0, rkq=0x7f339ba00d30, rko=0x7f3434711a70, cb_type=1) at rdkafka_op.c:588
#14 0x00007f344545e8aa in rd_kafka_op_handle (rk=0x7f33b41ec2f0, rkq=0x7f339ba00d30, rko=0x7f3434711a70, cb_type=RD_KAFKA_Q_CB_CALLBACK, opaque=0x0, callback=0x0) at rdkafka_op.c:619
#15 0x00007f344545b8ae in rd_kafka_q_serve (rkq=0x7f33b432c290, timeout_ms=0, max_cnt=0, cb_type=RD_KAFKA_Q_CB_CALLBACK, callback=0x0, opaque=0x0) at rdkafka_queue.c:474
#16 0x00007f3445427277 in rd_kafka_thread_main (arg=0x7f33b41ec2f0) at rdkafka.c:1266
---Type <return> to continue, or q <return> to quit---
#17 0x00007f344548f60e in _thrd_wrapper_function (aArg=0x7f33d0354ba0) at tinycthread.c:624
#18 0x00007f3466bbce45 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f34670dfafd in clone () from /lib64/libc.so.6

The logs shown:

E001:3-8 11:19:58.509(296302|7679)RDKAFKA-7-OFFSET: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: Offset reply for topic ReliableEventRetryPipe [0] (v2 vs v5) //rd_kafka_toppar_handle_Offset
E001:3-8 11:19:58.509(296302|7679)RDKAFKA-7-OFFSET: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: Offset reply error for topic ReliableEventRetryPipe [0] (v2): Local: Outdated //rd_kafka_toppar_handle_Offset
E001:3-8 11:19:58.509(296302|7679)RDKAFKA-7-TOPPARREMOVE: rdkafka#consumer-3: [thrd:main]: Removing toppar ReliableEventRetryPipe [0] 0x7f342c08c870
E001:3-8 11:19:58.509(296302|7679)RDKAFKA-7-DESTROY: rdkafka#consumer-3: [thrd:main]: ReliableEventRetryPipe [0]: 0x7f342c08c870 DESTROY_FINAL
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: ===== Received metadata (for 2 requested topics): refresh unavailable topics =====
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: ClusterId: FpA_LwT3Tj2ypkgdIJEtBw, ControllerId: 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001: 1 brokers, 2 topics
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001:   Broker #0/1: 10.249.92.197:26328 NodeId 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001:   Topic #0/2: ReliableEventPipe with 5 partitions
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]:   Topic ReliableEventPipe partition 2 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]:   Topic ReliableEventPipe partition 4 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]:   Topic ReliableEventPipe partition 1 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]:   Topic ReliableEventPipe partition 3 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]:   Topic ReliableEventPipe partition 0 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]: ssl://10.249.92.197:26328/20001:   Topic #1/2: ReliableEventRetryPipe with 3 partitions
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]:   Topic ReliableEventRetryPipe partition 2 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]:   Topic ReliableEventRetryPipe partition 1 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-METADATA: rdkafka#consumer-3: [thrd:main]:   Topic ReliableEventRetryPipe partition 0 Leader 20001
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-TOPPARREMOVE: rdkafka#consumer-3: [thrd:main]: Removing toppar ReliableEventRetryPipe [0] 0x7f342c08c870
E001:3-8 11:19:58.571(296302|7679)RDKAFKA-7-DESTROY: rdkafka#consumer-3: [thrd:main]: ReliableEventRetryPipe [0]: 0x7f342c08c870 DESTROY_FINAL

I am not sure it's normally to free the same object two times, it should not reference any more after destroyed. would you please help check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants