Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confluent python kafka client still crashes after 1.0.1 upgrade #2658

Closed
5 of 7 tasks
aelanthi opened this issue Dec 16, 2019 · 2 comments
Closed
5 of 7 tasks

confluent python kafka client still crashes after 1.0.1 upgrade #2658

aelanthi opened this issue Dec 16, 2019 · 2 comments

Comments

@aelanthi
Copy link

Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ

Description

How to reproduce

<your steps how to reproduce goes here, or remove section if not relevant>

IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/edenhill/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • librdkafka version (release number or git tag): 1.0.1
  • Apache Kafka version: 2.2.1
  • librdkafka client configuration: librdkafka client configuration: 'client.id' : 'client_name', 'group.id': 'client_name','session.timeout.ms' : 6000
  • Operating system: ubuntu 18.04
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue

I upgraded to confluent kafka 1.0.1 to address the assert crash previously seen in 0.9( #2234) . However, I still continue to see the crash often when
Confluent_kafka_assert_crash.txt
kafka client times out with error code 185. Could you please review the logs and let me know what could be going wrong?

--- Error logs ----
2019-12-09 07:59:32,351 - INFO - REQTMOUT [keyserver#consumer-7] [thrd:GroupCoordinator]: GroupCoordinator/0: Timed out HeartbeatRequest in flight (after 6720ms, timeout #0)
2019-12-09 07:59:32,352 - WARNING - REQTMOUT [keyserver#consumer-7] [thrd:GroupCoordinator]: GroupCoordinator/0: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
2019-12-09 07:59:43,249 - INFO - REQTMOUT [keyserver#consumer-7] [thrd:169.254.0.7:9092/bootstrap]: 169.254.0.7:9092/0: Timed out OffsetRequest in flight (after 60967ms, timeout #0)
2019-12-09 07:59:43,249 - INFO - REQTMOUT [keyserver#consumer-7] [thrd:169.254.0.7:9092/bootstrap]: 169.254.0.7:9092/0: Timed out OffsetRequest in flight (after 60580ms, timeout #1)
2019-12-09 07:59:43,268 - WARNING - REQTMOUT [keyserver#consumer-7] [thrd:169.254.0.7:9092/bootstrap]: 169.254.0.7:9092/0: Timed out 2 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
2019-12-09 08:02:22,379 - INFO - REQTMOUT [keyserver#consumer-7] [thrd:GroupCoordinator]: GroupCoordinator/0: Timed out HeartbeatRequest in flight (after 6653ms, timeout #0)
2019-12-09 08:02:22,380 - WARNING - REQTMOUT [keyserver#consumer-7] [thrd:GroupCoordinator]: GroupCoordinator/0: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
2019-12-09 08:02:33,745 - INFO - REQTMOUT [keyserver#consumer-7] [thrd:GroupCoordinator]: GroupCoordinator/0: Timed out ApiVersionRequest in flight (after 10795ms, timeout #0)
2019-12-09 08:02:33,746 - WARNING - REQTMOUT [keyserver#consumer-7] [thrd:GroupCoordinator]: GroupCoordinator/0: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
2019-12-09 08:02:46,305 - INFO - REQTMOUT [keyserver#consumer-7] [thrd:169.254.0.7:9092/bootstrap]: 169.254.0.7:9092/0: Timed out OffsetRequest in flight (after 60136ms, timeout #0)
2019-12-09 08:02:46,305 - WARNING - REQTMOUT [keyserver#consumer-7] [thrd:169.254.0.7:9092/bootstrap]: 169.254.0.7:9092/0: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
2019-12-09 08:03:47,615 - INFO - REQTMOUT [keyserver#consumer-7] [thrd:169.254.0.7:9092/bootstrap]: 169.254.0.7:9092/0: Timed out OffsetRequest in flight (after 60062ms, timeout #0)
2019-12-09 08:03:47,616 - INFO - REQTMOUT [keyserver#consumer-7] [thrd:169.254.0.7:9092/bootstrap]: 169.254.0.7:9092/0: Timed out OffsetRequest in flight (after 60062ms, timeout #1)
2019-12-09 08:03:47,616 - INFO - REQTMOUT [keyserver#consumer-7] [thrd:169.254.0.7:9092/bootstrap]: 169.254.0.7:9092/0: Timed out OffsetRequest in flight (after 60062ms, timeout #2)
2019-12-09 08:03:47,616 - INFO - REQTMOUT [keyserver#consumer-7] [thrd:169.254.0.7:9092/bootstrap]: 169.254.0.7:9092/0: Timed out OffsetRequest in flight (after 60062ms, timeout #3)
2019-12-09 08:03:47,616 - INFO - REQTMOUT [keyserver#consumer-7] [thrd:169.254.0.7:9092/bootstrap]: 169.254.0.7:9092/0: Timed out OffsetRequest in flight (after 60062ms, timeout #4)
2019-12-09 08:03:47,616 - WARNING - REQTMOUT [keyserver#consumer-7] [thrd:169.254.0.7:9092/bootstrap]: 169.254.0.7:9092/0: Timed out 27 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests

--- Error logs end ---

---Stack trace from core file ---
266842112 Dec 9 08:04 core.python.3440.processed

There were many traces with incomplete information, this is the one where I could find some symbols

(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f591d867801 in __GI_abort () at abort.c:79
#2 0x00007f591d85739a in __assert_fail_base (fmt=0x7f591d9de7d8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f591bb4542c "!"refcnt sub-zero"",
file=file@entry=0x7f591bb45427 "rd.h", line=line@entry=308, function=function@entry=0x7f591bb5aaf0 "rd_refcnt_sub0") at assert.c:92
#3 0x00007f591d857412 in __GI___assert_fail (assertion=0x7f591bb4542c "!
"refcnt sub-zero"", file=0x7f591bb45427 "rd.h", line=308, function=0x7f591bb5aaf0 "rd_refcnt_sub0") at assert.c:101
#4 0x00007f591ba81eb0 in ?? () from /usr/lib/librdkafka.so.1
#5 0x00007f591bae3030 in ?? () from /usr/lib/librdkafka.so.1
#6 0x00007f591baea5de in rd_kafka_topic_partition_list_destroy () from /usr/lib/librdkafka.so.1
#7 0x00007f591babe612 in ?? () from /usr/lib/librdkafka.so.1
#8 0x00007f591ba88c38 in ?? () from /usr/lib/librdkafka.so.1
#9 0x00007f591ba893e5 in ?? () from /usr/lib/librdkafka.so.1
#10 0x00007f591bd9d190 in Consumer_poll (self=0x7f591b257b00, args=, kwargs=) at confluent_kafka/src/Consumer.c:887
#11 0x00005617ee372724 in PyEval_EvalFrameEx ()
#12 0x00005617ee368b0a in PyEval_EvalCodeEx ()
#13 0x00005617ee3840b9 in ?? ()
#14 0x00005617ee3542ae in PyObject_Call ()
#15 0x00005617ee3741d0 in PyEval_CallObjectWithKeywords ()
#16 0x00005617ee407aa2 in ?? ()
#17 0x00007f591d60f6db in start_thread (arg=0x7f590affd700) at pthread_create.c:463
#18 0x00007f591d94888f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

--- Stack trace end ----

@justinfx
Copy link

I'm actually trying to track down the same behaviour, on the same ubuntu 18.04, with librdkafka/python 1.1.0.

I get that same message coupled with a flood of on_commit timeout errors:

WARNING  :200218-11:24:15, kafka <poll>, 699: 
    REQTMOUT [name#consumer-3] [thrd:GroupCoordinator]: 
        GroupCoordinator/2: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests

ERROR    :200218-11:24:15, kafka <on_commit>, 817: 
    KafkaError{code=_TIMED_OUT,val=-185,str="Local: Timed out"}

@edenhill
Copy link
Contributor

Can you try to reproduce this on v1.3.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants