Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadSanitizer: use of an invalid mutex (e.g. uninitialized or destroyed) (pid=7) #3279

Closed
6 of 7 tasks
filimonov opened this issue Mar 1, 2021 · 2 comments
Closed
6 of 7 tasks

Comments

@filimonov
Copy link
Contributor

filimonov commented Mar 1, 2021

Description

ThreadSanitizer in ClickHouse CI catched 'use of an invalid mutex'
https://clickhouse-test-reports.s3.yandex.net/0/174f49a9f13f87e3ad326043c249dad06e9b0e9b/integration_tests_(thread).html#fail1

How to reproduce

Scenario like following

  1. Consumer get created (unused - no subscribe / assign / poll calls, the topic does not exists)
  2. Producer for the same topic created (nothing get produced).
  3. Everything is closed.

It seems the failure happened on p.2.

Thread sanitizer report:

WARNING: ThreadSanitizer: use of an invalid mutex (e.g. uninitialized or destroyed) (pid=7)
    #0 pthread_mutex_lock <null> (clickhouse+0x8a234b6)
    #1 mtx_lock obj-x86_64-linux-gnu/../contrib/librdkafka/src/tinycthread.c:136:10 (clickhouse+0x15955fd9)
    #2 rd_kafka_buf_destroy_final obj-x86_64-linux-gnu/../contrib/librdkafka/src/rdkafka_buf.c:47:25 (clickhouse+0x1583444e)
    #3 rd_kafka_buf_callback obj-x86_64-linux-gnu/../contrib/librdkafka/src/rdkafka_buf.c:483:9 (clickhouse+0x1583565a)
    #4 rd_kafka_bufq_purge obj-x86_64-linux-gnu/../contrib/librdkafka/src/rdkafka_buf.c:255:17 (clickhouse+0x15835369)
    #5 rd_kafka_broker_fail obj-x86_64-linux-gnu/../contrib/librdkafka/src/rdkafka_broker.c:577:2 (clickhouse+0x1580ee02)
    #6 rd_kafka_broker_op_serve obj-x86_64-linux-gnu/../contrib/librdkafka/src/rdkafka_broker.c:3263:17 (clickhouse+0x1581f945)
    #7 rd_kafka_broker_ops_serve obj-x86_64-linux-gnu/../contrib/librdkafka/src/rdkafka_broker.c:3338:24 (clickhouse+0x1581f945)
    #8 rd_kafka_broker_ops_io_serve obj-x86_64-linux-gnu/../contrib/librdkafka/src/rdkafka_broker.c:3388:9 (clickhouse+0x15821bdf)
    #9 rd_kafka_broker_producer_serve obj-x86_64-linux-gnu/../contrib/librdkafka/src/rdkafka_broker.c:3972:17 (clickhouse+0x1581d681)
    #10 rd_kafka_broker_serve obj-x86_64-linux-gnu/../contrib/librdkafka/src/rdkafka_broker.c:5064:17 (clickhouse+0x1581d681)
    #11 rd_kafka_broker_thread_main obj-x86_64-linux-gnu/../contrib/librdkafka/src/rdkafka_broker.c:5223:25 (clickhouse+0x1581710e)
    #12 _thrd_wrapper_function obj-x86_64-linux-gnu/../contrib/librdkafka/src/tinycthread.c:576:9 (clickhouse+0x1595633b)

  Location is heap block of size 3920 at 0x7b8000053000 allocated by thread T3:
    #0 calloc <null> (clickhouse+0x8a04938)
    #1 rd_calloc obj-x86_64-linux-gnu/../contrib/librdkafka/src/rd.h:121:12 (clickhouse+0x15838640)
    #2 rd_kafka_new obj-x86_64-linux-gnu/../contrib/librdkafka/src/rdkafka.c:2094:7 (clickhouse+0x15838640)
    #3 cppkafka::Producer::Producer(cppkafka::Configuration) obj-x86_64-linux-gnu/../contrib/cppkafka/src/producer.cpp:49:23 (clickhouse+0x158092c5)
    #4 void std::__1::allocator<cppkafka::Producer>::construct<cppkafka::Producer, cppkafka::Configuration&>(cppkafka::Producer*, cppkafka::Configuration&) obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:886:28 (clickhouse+0x12af4e71)
    #5 void std::__1::allocator_traits<std::__1::allocator<cppkafka::Producer> >::__construct<cppkafka::Producer, cppkafka::Configuration&>(std::__1::integral_constant<bool, true>, std::__1::allocator<cppkafka::Producer>&, cppkafka::Producer*, cppkafka::Configuration&) obj-x86_64-linux-gnu/../contrib/libcxx/include/__memory/allocator_traits.h:519:21 (clickhouse+0x12af4e71)
    #6 void std::__1::allocator_traits<std::__1::allocator<cppkafka::Producer> >::construct<cppkafka::Producer, cppkafka::Configuration&>(std::__1::allocator<cppkafka::Producer>&, cppkafka::Producer*, cppkafka::Configuration&) obj-x86_64-linux-gnu/../contrib/libcxx/include/__memory/allocator_traits.h:481:14 (clickhouse+0x12af4e71)
    #7 std::__1::__shared_ptr_emplace<cppkafka::Producer, std::__1::allocator<cppkafka::Producer> >::__shared_ptr_emplace<cppkafka::Configuration&>(std::__1::allocator<cppkafka::Producer>, cppkafka::Configuration&) obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:2594:9 (clickhouse+0x12af4e71)
    #8 std::__1::shared_ptr<cppkafka::Producer> std::__1::allocate_shared<cppkafka::Producer, std::__1::allocator<cppkafka::Producer>, cppkafka::Configuration&, void>(std::__1::allocator<cppkafka::Producer> const&, cppkafka::Configuration&) obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:3360:55 (clickhouse+0x12af4e71)
    #9 std::__1::shared_ptr<cppkafka::Producer> std::__1::make_shared<cppkafka::Producer, cppkafka::Configuration&, void>(cppkafka::Configuration&) obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:3369:12 (clickhouse+0x12aea6bf)
    #10 DB::StorageKafka::createWriteBuffer(DB::Block const&) obj-x86_64-linux-gnu/../src/Storages/Kafka/StorageKafka.cpp:384:21 (clickhouse+0x12aea6bf)
...

  Mutex M0 (0x7b80000539c8) created at:
    [failed to restore the stack]

...

SUMMARY: ThreadSanitizer: use of an invalid mutex (e.g. uninitialized or destroyed) (/usr/bin/clickhouse+0x8a234b6) in pthread_mutex_lock
==================

Log fragment before failure:

### note: number in brackets is thread id.
### ... consumer creation ...
2021.03.01 09:59:09.085681 [ 12 ] {06b7589d-1854-4b37-bb10-caea32c41ece} <Debug> StorageKafka (kafka_q): [rdk:MEMBERID] [thrd:app]: Group "issue14202": updating member id "(not-set)" -> ""
2021.03.01 09:59:09.088059 [ 71 ] {} <Debug> StorageKafka (kafka_q): [rdk:CGRPSTATE] [thrd:main]: Group "issue14202" changed state init -> query-coord (join-state init)
2021.03.01 09:59:09.088330 [ 71 ] {} <Debug> StorageKafka (kafka_q): [rdk:CGRPQUERY] [thrd:main]: Group "issue14202": no broker available for coordinator query: intervaled in state query-coord
2021.03.01 09:59:09.090064 [ 12 ] {06b7589d-1854-4b37-bb10-caea32c41ece} <Debug> StorageKafka (kafka_q): [rdk:INIT] [thrd:app]: librdkafka v1.6.0 (0x10600ff) ClickHouse-instance-test-kafka_q#consumer-1 initialized (builtin.features gzip,snappy,ssl,sasl,regex,lz4,sasl_gssapi,sasl_plain,sasl_scram,plugins,zstd,sasl_oauthbearer, GCC GXX PKGCONFIG OSXLD LIBDL PLUGINS ZLIB SSL SASL_CYRUS ZSTD HDRHISTOGRAM LZ4_EXT SNAPPY SOCKEM SASL_SCRAM CRC32C_HW, debug 0x2184)

## ... producer creation .. 
2021.03.01 09:59:09.288361 [ 12 ] {9c2e7fe6-ac60-4822-b109-0e1be9072a7d} <Debug> StorageKafka (kafka_q): [rdk:INIT] [thrd:app]: librdkafka v1.6.0 (0x10600ff) ClickHouse-instance-test-kafka_q#producer-2 initialized (builtin.features gzip,snappy,ssl,sasl,regex,lz4,sasl_gssapi,sasl_plain,sasl_scram,plugins,zstd,sasl_oauthbearer, GCC GXX PKGCONFIG OSXLD LIBDL PLUGINS ZLIB SSL SASL_CYRUS ZSTD HDRHISTOGRAM LZ4_EXT SNAPPY SOCKEM SASL_SCRAM CRC32C_HW, debug 0x2184)
2021.03.01 09:59:09.288499 [ 12 ] {9c2e7fe6-ac60-4822-b109-0e1be9072a7d} <Warning> StorageKafka (kafka_q): [rdk:CONFWARN] [thrd:app]: Configuration property group.id is a consumer property and will be ignored by this producer instance
2021.03.01 09:59:09.288679 [ 12 ] {9c2e7fe6-ac60-4822-b109-0e1be9072a7d} <Warning> StorageKafka (kafka_q): [rdk:CONFWARN] [thrd:app]: Configuration property auto.offset.reset is a consumer property and will be ignored by this producer instance
2021.03.01 09:59:09.289297 [ 76 ] {} <Debug> StorageKafka (kafka_q): [rdk:CONNECTED] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Connected (#1)
2021.03.01 09:59:09.289559 [ 76 ] {} <Debug> StorageKafka (kafka_q): [rdk:FEATURE] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Updated enabled protocol features +ApiVersion to ApiVersion
2021.03.01 09:59:09.289811 [ 76 ] {} <Debug> StorageKafka (kafka_q): [rdk:SEND] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Sent ApiVersionRequest (v3, 80 bytes @ 0, CorrId 1)
2021.03.01 09:59:09.291989 [ 76 ] {} <Debug> StorageKafka (kafka_q): [rdk:RECV] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Received ApiVersionResponse (v3, 6 bytes, CorrId 1, rtt 2.18ms)
2021.03.01 09:59:09.292177 [ 76 ] {} <Debug> StorageKafka (kafka_q): [rdk:PROTOERR] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Protocol parse failure for ApiVersion v3(flex) at 3/6 (rd_kafka_handle_ApiVersion:2118) (incorrect broker.version.fallback?)
2021.03.01 09:59:09.292360 [ 76 ] {} <Debug> StorageKafka (kafka_q): [rdk:PROTOERR] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: ApiArrayCnt -1 out of range
2021.03.01 09:59:09.292500 [ 76 ] {} <Debug> StorageKafka (kafka_q): [rdk:APIVERSION] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: ApiVersionRequest v3 failed due to UNSUPPORTED_VERSION: retrying with v0
2021.03.01 09:59:09.292689 [ 76 ] {} <Debug> StorageKafka (kafka_q): [rdk:SEND] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Sent ApiVersionRequest (v0, 46 bytes @ 0, CorrId 2)
2021.03.01 09:59:09.292891 [ 12 ] {9c2e7fe6-ac60-4822-b109-0e1be9072a7d} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2021.03.01 09:59:09.293248 [ 76 ] {} <Debug> StorageKafka (kafka_q): [rdk:RECV] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Received ApiVersionResponse (v0, 270 bytes, CorrId 2, rtt 0.54ms)
2021.03.01 09:59:09.293512 [ 12 ] {9c2e7fe6-ac60-4822-b109-0e1be9072a7d} <Debug> StorageKafka (kafka_q): [rdk:DESTROY] [thrd:app]: Terminating instance (destroy flags none (0x0))
2021.03.01 09:59:09.293623 [ 76 ] {} <Debug> StorageKafka (kafka_q): [rdk:SEND] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Sent MetadataRequest (v4, 51 bytes @ 0, CorrId 3)
2021.03.01 09:59:09.293746 [ 74 ] {} <Debug> StorageKafka (kafka_q): [rdk:DESTROY] [thrd:main]: Destroy internal
2021.03.01 09:59:09.293895 [ 74 ] {} <Debug> StorageKafka (kafka_q): [rdk:DESTROY] [thrd:main]: Removing all topics
2021.03.01 09:59:09.294391 [ 75 ] {} <Debug> StorageKafka (kafka_q): [rdk:BRKTERM] [thrd::0/internal]: :0/internal: terminating: broker still has 2 refcnt(s), 0 buffer(s), 0 partition(s)
2021.03.01 09:59:09.443634 [ 11 ] {} <Trace> TCPHandlerFactory: TCP Request. Address: 172.19.0.1:33844
2021.03.01 09:59:10.087734 [ 71 ] {} <Debug> StorageKafka (kafka_q): [rdk:CGRPQUERY] [thrd:main]: Group "issue14202": no broker available for coordinator query: intervaled in state query-coord
2021.03.01 09:59:10.087924 [ 71 ] {} <Debug> StorageKafka (kafka_q): [rdk:CGRPQUERY] [thrd:main]: Group "issue14202": no broker available for coordinator query: intervaled in state query-coord
2021.03.01 09:59:10.088063 [ 71 ] {} <Debug> StorageKafka (kafka_q): [rdk:CGRPQUERY] [thrd:main]: Group "issue14202": no broker available for coordinator query: intervaled in state query-coord
2021.03.01 09:59:10.088667 [ 73 ] {} <Debug> StorageKafka (kafka_q): [rdk:CONNECTED] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Connected (#1)
2021.03.01 09:59:10.088892 [ 73 ] {} <Debug> StorageKafka (kafka_q): [rdk:FEATURE] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Updated enabled protocol features +ApiVersion to ApiVersion
2021.03.01 09:59:10.089151 [ 73 ] {} <Debug> StorageKafka (kafka_q): [rdk:SEND] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Sent ApiVersionRequest (v3, 80 bytes @ 0, CorrId 1)
2021.03.01 09:59:10.090401 [ 73 ] {} <Debug> StorageKafka (kafka_q): [rdk:RECV] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Received ApiVersionResponse (v3, 6 bytes, CorrId 1, rtt 1.24ms)
2021.03.01 09:59:10.090558 [ 73 ] {} <Debug> StorageKafka (kafka_q): [rdk:PROTOERR] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Protocol parse failure for ApiVersion v3(flex) at 3/6 (rd_kafka_handle_ApiVersion:2118) (incorrect broker.version.fallback?)
2021.03.01 09:59:10.090721 [ 73 ] {} <Debug> StorageKafka (kafka_q): [rdk:PROTOERR] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: ApiArrayCnt -1 out of range
2021.03.01 09:59:10.090872 [ 73 ] {} <Debug> StorageKafka (kafka_q): [rdk:APIVERSION] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: ApiVersionRequest v3 failed due to UNSUPPORTED_VERSION: retrying with v0
2021.03.01 09:59:10.091066 [ 73 ] {} <Debug> StorageKafka (kafka_q): [rdk:SEND] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Sent ApiVersionRequest (v0, 46 bytes @ 0, CorrId 2)
2021.03.01 09:59:10.091605 [ 73 ] {} <Debug> StorageKafka (kafka_q): [rdk:RECV] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Received ApiVersionResponse (v0, 270 bytes, CorrId 2, rtt 0.51ms)
2021.03.01 09:59:10.091901 [ 71 ] {} <Debug> StorageKafka (kafka_q): [rdk:CGRPQUERY] [thrd:main]: kafka1:19092/bootstrap: Group "issue14202": querying for coordinator: intervaled in state query-coord
2021.03.01 09:59:10.091959 [ 73 ] {} <Debug> StorageKafka (kafka_q): [rdk:SEND] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Sent MetadataRequest (v4, 51 bytes @ 0, CorrId 3)
2021.03.01 09:59:10.092104 [ 71 ] {} <Debug> StorageKafka (kafka_q): [rdk:CGRPSTATE] [thrd:main]: Group "issue14202" changed state query-coord -> wait-coord (join-state init)
2021.03.01 09:59:10.092138 [ 73 ] {} <Debug> StorageKafka (kafka_q): [rdk:SEND] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Sent FindCoordinatorRequest (v2, 59 bytes @ 0, CorrId 4)
2021.03.01 09:59:10.092590 [ 73 ] {} <Debug> StorageKafka (kafka_q): [rdk:RECV] [thrd:kafka1:19092/bootstrap]: kafka1:19092/bootstrap: Received MetadataResponse (v4, 58 bytes, CorrId 3, rtt 0.63ms)

Checklist

  • librdkafka version (release number or git tag): v1.6.0
  • Apache Kafka version: 5.2.0
  • librdkafka client configuration: auto.offset.reset=earliest; debug=cgrp,consumer,topic,protocol,metadata; broker.list=kafka1:19092; client.software.name=ClickHouse; client.software.version=v21.3.1.1-prestable; queued.min.messages=1048545; enable.auto.commit=false; enable.auto.offset.store=false; enable.partition.eof=false; client.id="ClickHouse-instance-test-kafka_q"
  • Operating system: Ubuntu:20.04
  • Provide logs (with debug=.. as necessary) from librdkafka - see above
  • Provide broker log excerpts - unrelated.
  • Critical issue
@filimonov
Copy link
Contributor Author

filimonov commented Mar 4, 2021

Extra info:
it's a race between consumer & producer initialization. Adding some sleep after consumer creation (and before producer initialization) hides the issue.

edenhill added a commit that referenced this issue Mar 8, 2021
As ops on broker queues may have references to the metadata cache's mutex
edenhill added a commit that referenced this issue Mar 8, 2021
As ops on broker queues may have references to the metadata cache's mutex
@edenhill edenhill closed this as completed Mar 9, 2021
@edenhill
Copy link
Contributor

edenhill commented Mar 9, 2021

Fixed on master

azat pushed a commit to ClickHouse/librdkafka that referenced this issue Aug 19, 2021
…c#3279)

As ops on broker queues may have references to the metadata cache's mutex

(cherry picked from commit 51c49f6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants