Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock on broker restart #326

Closed
chenzhanyiczy opened this issue Jul 7, 2015 · 4 comments
Closed

Deadlock on broker restart #326

chenzhanyiczy opened this issue Jul 7, 2015 · 4 comments
Labels
Milestone

Comments

@chenzhanyiczy
Copy link

when kafka restart,rdkafka can't detect relevant tcp state,netstat like as following:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 1 0 10.20.160.83:59671 10.20.160.83:9092(i.e,kafka server) CLOSE_WAIT 28306/nest
tcp 1 0 10.20.160.83:59659 10.20.160.83:9092(i.e,kafka server) CLOSE_WAIT 28306/nest
tcp 1 0 10.20.160.83:45838 10.20.170.234:9092(i.e,kafka server) CLOSE_WAIT 28306/nest

we can see the tcp state is CLOSE_WAIT that means kafka sent FIN to rdkafka,but rdkafka doesn't ALWAYS detect,so what cause this happen?

stack is as following(too much thread stacks, choose several,other are similar):

Thread 105 (Thread 0x7fdaf7833700 (LWP 28309)):
#0 0x00007fdc1209a53d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x0000000000505666 in rd_kafka_broker_metadata_reply (rkb=0x2098d30, err=-195, reply=0x0, request=0x7fdaf0000bc0, opaque=0x7fdaf7833700) at rdkafka_broker.c:1023
#2 0x0000000000508e43 in rd_kafka_bufq_purge (err=, rkbufq=, rkb=) at rdkafka_broker.c:324
#3 rd_kafka_broker_fail (rkb=0x2098d30, err=RD_KAFKA_RESP_ERR__TRANSPORT, fmt=0x546d6c "Receive failed: %s") at rdkafka_broker.c:411
#4 0x0000000000509197 in rd_kafka_recv (rkb=0x2098d30) at rdkafka_broker.c:1492
#5 0x0000000000509698 in rd_kafka_broker_io_serve (rkb=0x2098d30) at rdkafka_broker.c:2408
#6 0x000000000050bc6e in rd_kafka_broker_ua_idle (rkb=) at rdkafka_broker.c:2431
#7 rd_kafka_broker_thread_main (arg=0x2098d30) at rdkafka_broker.c:3986
#8 0x00007fdc12096e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007fdc113a9ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x0000000000000000 in ?? ()

Thread 104 (Thread 0x7fdaf7032700 (LWP 28310)):
#0 0x00007fdc1209a53d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000000000050679d in rd_kafka_broker_update (mdb=0x7fdae800106e, rk=) at rdkafka_broker.c:4251
#2 rd_kafka_metadata_handle (size=239, buf=, rko=, rkb=) at rdkafka_broker.c:947
#3 rd_kafka_broker_metadata_reply (rkb=0x20992c0, err=0, reply=0x7fdae8000970, request=0x7fdae8000dc0, opaque=) at rdkafka_broker.c:1017
#4 0x0000000000509450 in rd_kafka_req_response (rkbuf=, rkb=) at rdkafka_broker.c:1294
#5 rd_kafka_recv (rkb=0x20992c0) at rdkafka_broker.c:1486
#6 0x0000000000509698 in rd_kafka_broker_io_serve (rkb=0x20992c0) at rdkafka_broker.c:2408
#7 0x000000000050bc6e in rd_kafka_broker_ua_idle (rkb=) at rdkafka_broker.c:2431
#8 rd_kafka_broker_thread_main (arg=0x20992c0) at rdkafka_broker.c:3986
#9 0x00007fdc12096e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007fdc113a9ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x0000000000000000 in ?? ()

Thread 103 (Thread 0x7fdaf6831700 (LWP 28311)):
#0 0x00007fdc1209a53d in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00000000005102b7 in rd_kafka_topic_metadata_update (rkb=0x2099850, mdt=0x7fdae0001d70) at rdkafka_topic.c:979
#2 0x000000000050680e in rd_kafka_metadata_handle (size=3670, buf=, rko=, rkb=) at rdkafka_broker.c:966

---Type to continue, or q to quit---#3 rd_kafka_broker_metadata_reply (rkb=0x2099850, err=0, reply=0x7fdae0000940, request=0x7fdae0000b50, opaque=) at rdkafka_broker.c:1017
#4 0x0000000000509450 in rd_kafka_req_response (rkbuf=, rkb=) at rdkafka_broker.c:1294
#5 rd_kafka_recv (rkb=0x2099850) at rdkafka_broker.c:1486
#6 0x0000000000509698 in rd_kafka_broker_io_serve (rkb=0x2099850) at rdkafka_broker.c:2408
#7 0x000000000050bc6e in rd_kafka_broker_ua_idle (rkb=) at rdkafka_broker.c:2431
#8 rd_kafka_broker_thread_main (arg=0x2099850) at rdkafka_broker.c:3986
#9 0x00007fdc12096e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007fdc113a9ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7fd953fff700 (LWP 28408)):
#0 0x00007fdc1209b0fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00000000004153b5 in ant_nest::handler::run (this=0x20b6b10) at handler.cc:162
#2 0x000000000044424c in ant_nest::thread<ant_nest::handler>::shell (this=0x7fff0e520730) at thread.h:413
#3 0x000000000044427b in ant_nest::agent<ant_nest::handler, ant_nest::thread<ant_nest::handler>, &ant_nest::thread<ant_nest::handler>::shell> (arg=0x7fff0e520730) at thread.h:398
#4 0x00007fdc12096e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007fdc113a9ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7fd9f07f8700 (LWP 28409)):
#0 0x00007fdc1209b0fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00000000004153b5 in ant_nest::handler::run (this=0x20b7e10) at handler.cc:162
#2 0x000000000044424c in ant_nest::thread<ant_nest::handler>::shell (this=0x7fff0e520730) at thread.h:413
#3 0x000000000044427b in ant_nest::agent<ant_nest::handler, ant_nest::thread<ant_nest::handler>, &ant_nest::thread<ant_nest::handler>::shell> (arg=0x7fff0e520730) at thread.h:398
#4 0x00007fdc12096e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007fdc113a9ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000000000 in ?? ()

thx.

@chenzhanyiczy
Copy link
Author

I use the newest master version

@chenzhanyiczy
Copy link
Author

hi,edenhill, have you idea?

@edenhill
Copy link
Contributor

That looks like a thread deadlock.
I will investigate, thanks

@edenhill edenhill changed the title rdkafka can't work Deadlock on broker restart Jul 13, 2015
@edenhill edenhill added the bug label Jul 13, 2015
@edenhill edenhill added this to the dev15_merge milestone Aug 27, 2015
@edenhill
Copy link
Contributor

Fixed on master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants