Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIXED] EventLoop: Socket now closed only after event loop done polling #815

Merged
merged 2 commits into from
Nov 5, 2024

Conversation

kozlovic
Copy link
Member

The socket was closed by the NATS library itself, which could cause some issue when the event loop thread would still be polling it. We now defer to the event loop adapter to make sure that the event loop library is done polling before invoking a new function that will take care of closing the socket.

I have updated the event loop test (that simulates what our adapters are doing). The mockup event loop implementation is a bit too simplistic but should be ok for now. If we have issues, we would have to make the events a linked list.

Resolves #814

Signed-off-by: Ivan Kozlovic [email protected]

The socket was closed by the NATS library itself, which could cause
some issue when the event loop thread would still be polling it.
We now defer to the event loop adapter to make sure that the event
loop library is done polling before invoking a new function that
will take care of closing the socket.

I have updated the event loop test (that simulates what our adapters are
doing). The mockup event loop implementation is a bit too simplistic
but should be ok for now. If we have issues, we would have to make
the events a linked list.

Resolves #814

Signed-off-by: Ivan Kozlovic <[email protected]>
@kozlovic kozlovic requested a review from levb October 23, 2024 22:43
@kozlovic
Copy link
Member Author

@levb I would like if possible have @dmitry-sles-novikov build from this branch and test with it to see if that resolves the issue.

@levb Assuming we get an ok, note that although this is a fix, we have to add a new API in nats.h and also the event loop adapters now will use this new API. So not sure if it will have to be a 3.10 and not a fix to 3.9.

Copy link

codecov bot commented Oct 23, 2024

Codecov Report

Attention: Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.53%. Comparing base (1553d4a) to head (53145b5).
Report is 23 commits behind head on main.

Files with missing lines Patch % Lines
src/conn.c 84.61% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #815      +/-   ##
==========================================
+ Coverage   68.71%   70.53%   +1.82%     
==========================================
  Files          39       47       +8     
  Lines       15207    15455     +248     
  Branches     3143     3183      +40     
==========================================
+ Hits        10449    10901     +452     
+ Misses       1700     1477     -223     
- Partials     3058     3077      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -2146,9 +2146,21 @@ _evStopPolling(natsConnection *nc)

nc->sockCtx.useEventLoop = false;
nc->el.writeAdded = false;
s = nc->opts->evCbs.read(nc->el.data, NATS_EVENT_ACTION_REMOVE);
// The "write" event is added and removed as we write, however, we always
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Right, matching the changes in natsLibevent_Read, thx for the explanation comment!

Copy link
Collaborator

@levb levb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx for the descriptive comments!

@levb
Copy link
Collaborator

levb commented Oct 24, 2024

@dmitry-sles-novikov standing by for your confirmation

@dmitry-sles-novikov
Copy link

@levb Sorry, but I found a data race and am looking for its causes.

@kozlovic
Copy link
Member Author

@dmitry-sles-novikov What don't you post the report from the sanitizer so we can all have a look?

@dmitry-sles-novikov
Copy link

dmitry-sles-novikov commented Oct 28, 2024

@kozlovic I wanted to localize the problem, the reason could be in my code =). And I have localized problems now:

  1. Data race after file descriptor closing
Data race (pid=2524979)
Read of size 8 at 0x72b000000e60 by thread T544 (mutexes: write M0):
0x55555df5fb47 epoll_ctl tsan_interceptors_posix.cpp:1958 
0x5555772a0674 uv__platform_invalidate_fd linux.c:730 
0x5555772998eb uv__poll_stop poll.c:108 
0x5555772998eb uv__poll_close poll.c:159 
0x555577291dca uv_close core.c:208 
0x55557070eb56 closeSchedulerCb libuv.h:266 
0x555577292831 uv__finish_close core.c:351 
0x555577292831 uv__run_closing_handles core.c:365 
0x555577292831 uv_run core.c:463 
....
Previous write of size 8 at 0x72b000000e60 by thread T581:
0x55555df5cf5a open tsan_interceptors_posix.cpp:1658 
0x555566583773 DB::WriteBufferFromFile::WriteBufferFromFile WriteBufferFromFile.cpp:48 
0x55556df02ab1 std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x55556def90a1 DB::DiskLocal::writeFile DiskLocal.cpp:338 

My guess: now we close file descriptor after calling of uv_poll_stop and before uv_close, but libuv use file descriptor in uv_close function: uv__platform_invalidate_fd(handle->loop, handle->io_watcher.fd);

Although the documentation says that this is acceptable:

However the fd can be safely closed immediately after a call to uv_poll_stop() or uv_close().
  1. Data race on scheduler closing:
Data race (pid=2555740)
Write of size 8 at 0x722000181278 by thread T546 (mutexes: write M0):
0x55555df57416 free tsan_interceptors_posix.cpp:725 
0x55557070ec14 finalCloseCb libuv.h:255 
0x5555772927b1 uv__finish_close core.c:351 
0x5555772927b1 uv__run_closing_handles core.c:365 
0x5555772927b1 uv_run core.c:463 
...
Previous atomic read of size 4 at 0x722000181278 by thread T2 (mutexes: write M1, write M2):
0x5555772914cb uv_async_send async.c:73 
0x55557070c8d7 uvScheduleToEventLoop libuv.h:135 
0x55557070c8d7 natsLibuv_Detach libuv.h:496 
0x5555750f5b71 _close conn.c:2662 
0x5555750f7a92 natsConn_close conn.c:4045 
0x5555750f7a92 natsConnection_Close conn.c:4063 
0x5555707128f4 DB::NATSConnection::disconnectImpl NATSConnection.cpp:147 
0x5555707128f4 DB::NATSConnection::disconnect NATSConnection.cpp:108 
0x5555706ffdcb DB::StorageNATS::shutdown StorageNATS.cpp:450 
0x55556eac4a24 DB::IStorage::flushAndShutdown IStorage.h:597 
0x55556eac4a24 DB::InterpreterDropQuery::executeToTableImpl InterpreterDropQuery.cpp:305 
0x55556eac260c DB::InterpreterDropQuery::executeToTable InterpreterDropQuery.cpp:122 
0x55556eac1abe DB::InterpreterDropQuery::executeSingleDropQuery InterpreterDropQuery.cpp:95 
0x55556eac1616 DB::InterpreterDropQuery::execute InterpreterDropQuery.cpp:76 
0x55556f30f0a1 DB::executeQueryImpl executeQuery.cpp:1349 
0x55556f3097c2 DB::executeQuery executeQuery.cpp:1516 
...

My guess: when we schedule detach event to event loop, we can call free on scheduler before completion of uv_async_send:

int uv_async_send(uv_async_t* handle) {
  _Atomic int* pending;
  _Atomic int* busy;

  pending = (_Atomic int*) &handle->pending;
  busy = (_Atomic int*) &handle->u.fd;

  /* Do a cheap read first. */
  if (atomic_load_explicit(pending, memory_order_relaxed) != 0)
    return 0;

  /* Set the loop to busy. */
  atomic_fetch_add(busy, 1);

  /* Wake up the other thread's event loop. */
  if (atomic_exchange(pending, 1) == 0)
    uv__async_send(handle->loop);

-->//calling free(scheduler) here in event loop thread

  /* Set the loop to not-busy. */
--->atomic_fetch_add(busy, -1); ///  using member u.fd from scheduler

  return 0;
}

@kozlovic
Copy link
Member Author

@dmitry-sles-novikov So I have spent some time reviewing this and also compiled the libuv library in Debug mode and added some print statements to show you that the case 2 is bogus. I will explain later.

But first note (that may apply to both cases), it looks like the natsConnection line numbers do not match the code in this PR. This is crucial since the changes are both in the libuv.h and NATS library, so both need to be in sync.

For case 1, unless you have truncated the stack for the "Previous write", I don't see any code from the NATS C library, so I would say that something else is racing with the libuv adapter, but not NATS C Client, which by the way owns the fd, so you would have to look at what the "DB::..." code is doing. I can't help there.

For case 2, I have searched and I believe that I am doing the right thing to properly dispose of the uv_async_t object (the nle->scheduler and nle in general). It can't be that uv_async_send would internally reference an object past its close callback execution. It would make it impossible to properly free such object. I have modified uv_async_send() to sleep after the call to uv__async_send (notice the 2 _) and before the switch of busy to -1.

diff --git a/src/unix/async.c b/src/unix/async.c
index 04015db0..8a717726 100644
--- a/src/unix/async.c
+++ b/src/unix/async.c
@@ -65,6 +65,7 @@ int uv_async_init(uv_loop_t* loop, uv_async_t* handle, uv_async_cb async_cb) {
 int uv_async_send(uv_async_t* handle) {
   _Atomic int* pending;
   _Atomic int* busy;
+  int doSleep = 0;
 
   pending = (_Atomic int*) &handle->pending;
   busy = (_Atomic int*) &handle->busy;
@@ -76,10 +77,19 @@ int uv_async_send(uv_async_t* handle) {
   /* Set the loop to busy. */
   atomic_fetch_add(busy, 1);
 
+  fprintf(stderr, "@@IK: in uv_async_send before uv__async_send handle=%p\n", (void*) handle);
+
   /* Wake up the other thread's event loop. */
   if (atomic_exchange(pending, 1) == 0)
     uv__async_send(handle->loop);
 
+  if (doSleep)
+  {
+    fprintf(stderr, "@@IK: in uv_async_send after uv__async_send handle=%p sleep for 2sec...\n", (void*) handle);
+    sleep(2);
+    fprintf(stderr, "@@IK: in uv_async_send after sleep, call atomic_fetch_add for handle=%p\n", (void*) handle);
+  }
+
   /* Set the loop to not-busy. */
   atomic_fetch_add(busy, -1);

I set a break point in natsLibuv_Detach and then add the break point in uv_async_send. When stepping, I then set doSleep to 1, which means that this function will "sleep" for 2 seconds, to see if it would be possible for the handle to be freed before uv_async_send resumes.

The finalCloseCb cannot execute because this is the point of the use of this atomic, is to make sure that the event loop thread does not process the close callback until this atomic "busy" flag is decremented to 0. See what the event loop thread is doing while in the sleep:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
  * frame #0: 0x000000010005e956 nats-libuv-pub`uv__async_spin(handle=0x00006000008d4000) at async.c:118:27
    frame #1: 0x000000010005e885 nats-libuv-pub`uv__async_close(handle=0x00006000008d4000) at async.c:136:3
    frame #2: 0x000000010005f179 nats-libuv-pub`uv_close(handle=0x00006000008d4000, close_cb=(nats-libuv-pub`closeSchedulerCb at libuv.h:264)) at core.c:174:5
    frame #3: 0x0000000100002b30 nats-libuv-pub`uvAsyncDetach(nle=0x00006000039d4080) at libuv.h:273:5
    frame #4: 0x00000001000025c3 nats-libuv-pub`uvAsyncCb(handle=0x00006000008d4000) at libuv.h:320:17
    frame #5: 0x000000010005eeca nats-libuv-pub`uv__async_io(loop=0x0000000100091a40, w=0x0000000100091ca8, events=1) at async.c:186:5
    frame #6: 0x000000010007b45b nats-libuv-pub`uv__io_poll(loop=0x0000000100091a40, timeout=-1) at kqueue.c:385:9
    frame #7: 0x000000010005f732 nats-libuv-pub`uv_run(loop=0x0000000100091a40, mode=UV_RUN_DEFAULT) at core.c:430:5
    frame #8: 0x0000000100002c9b nats-libuv-pub`main(argc=3, argv=0x00007ff7bfeff710) at libuv-pub.c:107:9
    frame #9: 0x00007ff815fae345 dyld`start + 1909

You can look at uv__async_spin and see that it waits for that busy flag (an atomic int) to be down to 0 before proceeding.

You can change to int doSleep = 1 and run the program say nats-libuv-pub with -count 1 and we will get the first invocation of the uv_async_send for the initial socket write, then 2 to remove the write and read events, then the last for the detach. You will notice that finalCloseCb is invoked only after the resume from the 2sec sleep:

Current executable set to '/Users/ivan/dev/cnats/build/bin/nats-libuv-pub' (x86_64).
(lldb) run -count 1
Process 20194 launched: '/Users/ivan/dev/cnats/build/bin/nats-libuv-pub' (x86_64)
Sending 1 messages to subject 'foo'
@@IK: in uv_async_send before uv__async_send handle=0x6000037b4000
@@IK: in uv_async_send after uv__async_send handle=0x6000037b4000 sleep for 2sec...
@@IK: in uv_async_send after sleep, call atomic_fetch_add for handle=0x6000037b4000
@@IK: in uv_async_send before uv__async_send handle=0x6000037b4000
@@IK: in uv_async_send after uv__async_send handle=0x6000037b4000 sleep for 2sec...
@@IK: in uv_async_send after sleep, call atomic_fetch_add for handle=0x6000037b4000
@@IK: in uv_async_send before uv__async_send handle=0x6000037b4000
@@IK: in uv_async_send after uv__async_send handle=0x6000037b4000 sleep for 2sec...
@@IK: in uv_async_send after sleep, call atomic_fetch_add for handle=0x6000037b4000
@@IK: in uv_async_send before uv__async_send handle=0x6000037b4000
@@IK: in uv_async_send after uv__async_send handle=0x6000037b4000 sleep for 2sec...
@@IK: in uv_async_send after sleep, call atomic_fetch_add for handle=0x6000037b4000
@@IK: finalCloseCb handle=0x6000021ac000 sched=0x6000037b4000

So again, given that the NATS C library line numbers do not match this PR, it could be that you do not have the up-to-date libuv.h and NATS C client library (or your app is not linked to those), or the sanitizer is reporting a false positive (or real but because your platform is not implementing correctly the atomics?). Regardless, based on the libuv code reading, the NATS C client is doing the right thing and there is nothing to fix (in addition to what is done in this PR).

@dmitry-sles-novikov
Copy link

@kozlovic Yes, I may be wrong in my assumptions.

@dmitry-sles-novikov
Copy link

@levb LGTM

@dmitry-sles-novikov
Copy link

Thank you very much for your help =)

@kozlovic
Copy link
Member Author

@dmitry-sles-novikov PS: I realize that your "case 2" stack looks a bit suspicious, in that it looks like it was tearing down. The line numbers do not correspond to my version which is latest on master. Could you tell me which commit of the libuv code you were using?

@kozlovic
Copy link
Member Author

I had a quick look at the use of NATS and libuv in ClickHouse (and your PR) and this is too complex for me to comment, but I wonder if this is used properly. Some notes (and they may not be relevant, but again, too complex for me): You can't call natsLibuv_SetThreadLocalLoop from anywhere, you have to call it only from the single thread where this loop is run from. It looks also that they are using uv_run(.., UV_RUN_NOWAIT), which is not an issue per-se, but I don't really see a uv_loop_close except in Storages/UVLoop.h. So in the NATS specific code, I don't see anything related to that, but I wonder if there is a case where (because of the no_wait) we close the loop "too soon"?

@dmitry-sles-novikov
Copy link

@kozlovic clickhouse uses its own fork for nats.c, so i had to rebase your changes to it (branch fix_814 in repo). And for libuv used own fork too.

That's why it took me so long to check, because I couldn't be sure where exactly the problems were.

P.S. For case 1: in clickhouse the socket is always closed from another thread, so uvAsyncDetach is called asynchronously by publishing an event to the event loop thread. Maybe this is important

@kozlovic
Copy link
Member Author

P.S. For case 1: in clickhouse the socket is always closed from another thread, so uvAsyncDetach is called asynchronously by publishing an event to the event loop thread. Maybe this is important

Again, the socket should be owned by the NATS library, and it is closed in some cases due to stale connection or when an error occurs and will start a thread to reconnect, or when calling natsConnection_Close(). In any case, as long as the forks do similar things that the main repo, the socket close should now always happen from the event loop thread.

@dmitry-sles-novikov
Copy link

dmitry-sles-novikov commented Oct 29, 2024

You can't call natsLibuv_SetThreadLocalLoop from anywhere, you have to call it only from the single thread where this loop is run from.

I fixed it in my PR, previous version of code call natsLibuv_SetThreadLocalLoop from all threads

So in the NATS specific code, I don't see anything related to that, but I wonder if there is a case where (because of the no_wait) we close the loop "too soon"?

At the moment the sanitizer is triggered the event loop is still running. I am checked this

@dmitry-sles-novikov
Copy link

I'll try to run tests on your libuv version, but it will take some time

@dmitry-sles-novikov
Copy link

@kozlovic Full sanitizer output with libuv rebased to clickhouse libuv fork

Data race in epoll_ctl(loop->backend_fd, EPOLL_CTL_DEL, fd, &dummy);
Write of size 8 at 0x72b000000a40 by thread T658 (mutexes: write M0):
0x55555df5f728 closedir tsan_interceptors_posix.cpp:1932 
0x555578404e0e std::__fs::filesystem::__dir_stream::close directory_iterator.cpp:153 
0x555578404669 std::__fs::filesystem::__dir_stream::advance directory_iterator.cpp:139 
0x555578402c45 std::__fs::filesystem::directory_iterator::__increment directory_iterator.cpp:189 
0x555571ce7f0a std::__fs::filesystem::directory_iterator::operator++[abi:v15007] directory_iterator.h:92 
0x555571ce7f0a DB::ConfigProcessor::getConfigMergeFiles ConfigProcessor.cpp:609 
0x555571cf9500 DB::ConfigReloader::getNewFileList const ConfigReloader.cpp:220 
0x555571cf615c DB::ConfigReloader::reloadIfNewer ConfigReloader.cpp:113 
0x555571cf8464 DB::ConfigReloader::run ConfigReloader.cpp:89 
0x555571cfa1ce std::__invoke[abi:v15007]<…> invoke.h:359 
0x555571cfa1ce std::__apply_tuple_impl[abi:v15007]<…> tuple:1789 
0x555571cfa1ce std::apply[abi:v15007]<…> tuple:1798 
0x555571cfa1ce <lambda>::operator() ThreadPool.h:311 
0x555571cfa0e2 std::__invoke[abi:v15007]<…> invoke.h:394 
0x555571cfa0e2 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x555571cfa0e2 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x555571cfa0e2 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555566480daa std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555566480daa std::function::operator() const function.h:1197 
0x555566480daa ThreadPoolImpl::ThreadFromThreadPool::worker ThreadPool.cpp:785 
0x555566488e8e std::__invoke[abi:v15007]<…> invoke.h:359 
0x555566488e8e std::__thread_execute[abi:v15007]<…> thread:284 
0x555566488e8e std::__thread_proxy[abi:v15007]<…> thread:295 

Previous read of size 8 at 0x72b000000a40 by thread T546 (mutexes: write M1):
0x55555df5fb47 epoll_ctl tsan_interceptors_posix.cpp:1958 
0x5555772a0514 uv__platform_invalidate_fd linux.c:724 
0x555577299865 uv__poll_stop poll.c:102 
0x555577299865 uv__poll_close poll.c:153 
0x555577291d2a uv_close core.c:199 
0x55557070eb56 closeSchedulerCb libuv.h:266 
0x555577292731 uv__finish_close core.c:342 
0x555577292731 uv__run_closing_handles core.c:356 
0x555577292731 uv_run core.c:445 
0x55557070d4ec DB::NATSHandler::runLoop NATSHandler.cpp:72 
0x5555707039e7 $_0::operator() const StorageNATS.cpp:104 
0x5555707039e7 std::__invoke[abi:v15007]<…> invoke.h:394 
0x5555707039e7 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x5555707039e7 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x5555707039e7 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x55556d69cf56 std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x55556d69cf56 std::function::operator() const function.h:1197 
0x55556d69cf56 DB::BackgroundSchedulePoolTaskInfo::execute BackgroundSchedulePool.cpp:106 
0x55556d6a0da4 DB::BackgroundSchedulePool::threadFunction BackgroundSchedulePool.cpp:304 
0x55556d6a1633 $_0::operator() const BackgroundSchedulePool.cpp:170 
0x55556d6a1633 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556d6a1633 (clickhouse+0x1814d632) tuple:1789 
0x55556d6a1633 std::apply[abi:v15007]<…> tuple:1798 
0x55556d6a1633 <lambda>::operator() ThreadPool.h:311 
0x55556d6a1633 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556d6a1633 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x55556d6a1633 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x55556d6a1633 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555566480daa std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555566480daa std::function::operator() const function.h:1197 
0x555566480daa ThreadPoolImpl::ThreadFromThreadPool::worker ThreadPool.cpp:785 
0x555566488e8e std::__invoke[abi:v15007]<…> invoke.h:359 
0x555566488e8e std::__thread_execute[abi:v15007]<…> thread:284 
0x555566488e8e std::__thread_proxy[abi:v15007]<…> thread:295 

Location is file descriptor 82 destroyed by thread T546 at:
0x55555df5e8d3 close tsan_interceptors_posix.cpp:1828 
0x5555750fca9d _closeFd comsock.c:39 
0x5555750fca9d natsSock_Close comsock.c:45 
0x5555750f7eeb natsConnection_ProcessCloseEvent conn.c:4192 
0x55557070c623 uvPollUpdate libuv.h:191 
0x55557070c623 uvAsyncCb libuv.h:314 
0x555577291b73 uv__async_io async.c:176 
0x5555772a1278 uv__io_poll linux.c:1528 
0x55557729235f uv_run core.c:430 
0x55557070d4ec DB::NATSHandler::runLoop NATSHandler.cpp:72 
0x5555707039e7 $_0::operator() const StorageNATS.cpp:104 
0x5555707039e7 std::__invoke[abi:v15007]<…> invoke.h:394 
0x5555707039e7 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x5555707039e7 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x5555707039e7 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x55556d69cf56 std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x55556d69cf56 std::function::operator() const function.h:1197 
0x55556d69cf56 DB::BackgroundSchedulePoolTaskInfo::execute BackgroundSchedulePool.cpp:106 
0x55556d6a0da4 DB::BackgroundSchedulePool::threadFunction BackgroundSchedulePool.cpp:304 
0x55556d6a1633 $_0::operator() const BackgroundSchedulePool.cpp:170 
0x55556d6a1633 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556d6a1633 (clickhouse+0x1814d632) tuple:1789 
0x55556d6a1633 std::apply[abi:v15007]<…> tuple:1798 
0x55556d6a1633 <lambda>::operator() ThreadPool.h:311 
0x55556d6a1633 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556d6a1633 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x55556d6a1633 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x55556d6a1633 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555566480daa std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555566480daa std::function::operator() const function.h:1197 
0x555566480daa ThreadPoolImpl::ThreadFromThreadPool::worker ThreadPool.cpp:785 
0x555566488e8e std::__invoke[abi:v15007]<…> invoke.h:359 
0x555566488e8e std::__thread_execute[abi:v15007]<…> thread:284 
0x555566488e8e std::__thread_proxy[abi:v15007]<…> thread:295 

Mutex M0 (0x7248000098a0) created at:
0x55555df5a55c pthread_mutex_lock tsan_interceptors_posix.cpp:1342 
0x55557847819a std::__libcpp_mutex_lock[abi:v15007] __threading_support:304 
0x55557847819a std::mutex::lock mutex.cpp:38 
0x555571cf614d std::lock_guard::lock_guard[abi:v15007] __mutex_base:94 
0x555571cf614d DB::ConfigReloader::reloadIfNewer ConfigReloader.cpp:111 
0x555571cf576a DB::ConfigReloader::ConfigReloader ConfigReloader.cpp:30 
0x55556d58525f std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x55556d58525f DB::UsersConfigAccessStorage::load UsersConfigAccessStorage.cpp:886 
0x55556d3a7b43 DB::AccessControl::addUsersConfigStorage AccessControl.cpp:355 
0x55556d3abcf7 DB::AccessControl::addStoragesFromUserDirectoriesConfig AccessControl.cpp:459 
0x55556d3a63f6 DB::AccessControl::addStoragesFromMainConfig AccessControl.cpp:524 
0x55556d3a4d06 DB::AccessControl::setUpFromMainConfig AccessControl.cpp:308 
0x5555665a2890 DB::Server::main Server.cpp:1958 
0x55557546959d Poco::Util::Application::run Application.cpp:315 
0x555566590a1a DB::Server::run Server.cpp:416 
0x55557548690e Poco::Util::ServerApplication::run ServerApplication.cpp:131 
0x55556658dc1b mainEntryClickHouseServer Server.cpp:222 
0x55555dfe0e18 main main.cpp:248 

Mutex M1 (0x723400729618) created at:
0x55555df5a55c pthread_mutex_lock tsan_interceptors_posix.cpp:1342 
0x55557847819a std::__libcpp_mutex_lock[abi:v15007] __threading_support:304 
0x55557847819a std::mutex::lock mutex.cpp:38 
0x55556d69c9ea std::lock_guard::lock_guard[abi:v15007] __mutex_base:94 
0x55556d69c9ea DB::BackgroundSchedulePoolTaskInfo::deactivate BackgroundSchedulePool.cpp:51 
0x5555706f9bdd DB::StorageNATS::StorageNATS StorageNATS.cpp:105 
0x55557070bc5c std::construct_at[abi:v15007]<…> construct_at.h:35 
0x55557070bc5c std::allocator_traits::construct[abi:v15007]<…> allocator_traits.h:298 
0x55557070bc5c std::__shared_ptr_emplace::__shared_ptr_emplace[abi:v15007]<…> shared_ptr.h:292 
0x55557070bc5c std::allocate_shared[abi:v15007]<…> shared_ptr.h:953 
0x5555707041b0 std::make_shared[abi:v15007]<…> shared_ptr.h:962 
0x5555707041b0 $_0::operator() const StorageNATS.cpp:740 
0x5555707041b0 std::__invoke[abi:v15007]<…> invoke.h:394 
0x5555707041b0 std::__invoke_void_return_wrapper::__call<…> invoke.h:470 
0x5555707041b0 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x5555707041b0 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555570173b38 std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555570173b38 std::function::operator() const function.h:1197 
0x555570173b38 DB::StorageFactory::get const StorageFactory.cpp:231 
0x55556eaa33c5 DB::InterpreterCreateQuery::doCreateTable InterpreterCreateQuery.cpp:1859 
0x55556ea9c3ad DB::InterpreterCreateQuery::createTable InterpreterCreateQuery.cpp:1675 
0x55556eaa92ce DB::InterpreterCreateQuery::execute InterpreterCreateQuery.cpp:2219 
0x55556f30f0a1 DB::executeQueryImpl executeQuery.cpp:1349 
0x55556f3097c2 DB::executeQuery executeQuery.cpp:1516 
0x555571122792 DB::TCPHandler::runImpl TCPHandler.cpp:587 
0x5555711405c8 DB::TCPHandler::run TCPHandler.cpp:2519 
0x555575435983 Poco::Net::TCPServerConnection::start TCPServerConnection.cpp:43 
0x5555754361d2 Poco::Net::TCPServerDispatcher::run TCPServerDispatcher.cpp:115 
0x5555753b17a3 Poco::PooledThread::run ThreadPool.cpp:205 
0x5555753afc90 Poco::RunnableHolder::run Thread.cpp:45 
0x5555753ae32a Poco::ThreadImpl::runnableEntry Thread_POSIX.cpp:335 

Thread T658 'ConfigReloader' (tid=3183688, running) created by main thread at:
0x55555df5885c pthread_create tsan_interceptors_posix.cpp:1023 
0x55556648146a std::__libcpp_thread_create[abi:v15007] __threading_support:376 
0x55556648146a std::thread::thread<…> thread:311 
0x55556648014a ThreadPoolImpl::ThreadFromThreadPool::ThreadFromThreadPool ThreadPool.cpp:598 
0x555566481e92 std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x555566481e92 ThreadPoolImpl::scheduleImpl<…> ThreadPool.cpp:284 
0x555566484a17 ThreadPoolImpl::scheduleOrThrow ThreadPool.cpp:490 
0x555566484a17 ThreadFromGlobalPoolImpl::ThreadFromGlobalPoolImpl<…> ThreadPool.h:278 
0x5555664832fc ThreadPoolImpl::ThreadFromThreadPool::ThreadFromThreadPool ThreadPool.cpp:598 
0x555566485752 std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x555566485752 ThreadPoolImpl::scheduleImpl<…> ThreadPool.cpp:284 
0x555566486ff7 ThreadPoolImpl::scheduleOrThrow ThreadPool.cpp:490 
0x55556dd1e2d4 DB::DatabaseOnDisk::iterateMetadataFiles const DatabaseOnDisk.cpp:691 
0x55556dd3d98a DB::DatabaseOrdinary::loadTablesMetadata DatabaseOrdinary.cpp:282 
0x55556eab86cd DB::TablesLoader::loadTablesAsync TablesLoader.cpp:59 
0x55556f33ffc5 DB::loadMetadataSystem loadMetadata.cpp:525 
0x5555665a584e DB::Server::main Server.cpp:2098 
0x55557546959d Poco::Util::Application::run Application.cpp:315 
0x555566590a1a DB::Server::run Server.cpp:416 
0x55557548690e Poco::Util::ServerApplication::run ServerApplication.cpp:131 
0x55556658dc1b mainEntryClickHouseServer Server.cpp:222 
0x55555dfe0e18 main main.cpp:248 

Thread T546 'BgMBSchPool' (tid=3183576, running) created by main thread at:
0x55555df5885c pthread_create tsan_interceptors_posix.cpp:1023 
0x55556648146a std::__libcpp_thread_create[abi:v15007] __threading_support:376 
0x55556648146a std::thread::thread<…> thread:311 
0x55556648014a ThreadPoolImpl::ThreadFromThreadPool::ThreadFromThreadPool ThreadPool.cpp:598 
0x555566481e92 std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x555566481e92 ThreadPoolImpl::scheduleImpl<…> ThreadPool.cpp:284 
0x555566482ab7 ThreadPoolImpl::scheduleOrThrow ThreadPool.cpp:490 
0x55556d69e442 ThreadFromGlobalPoolImpl::ThreadFromGlobalPoolImpl<…> ThreadPool.h:278 
0x55556d69e442 DB::BackgroundSchedulePool::BackgroundSchedulePool BackgroundSchedulePool.cpp:170 
0x55556e3a965f std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x55556e3a965f $_0::operator() const Context.cpp:3475 
0x55556e3a965f std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556e3a965f (clickhouse+0x18e5565e) mutex:627 
0x55556e3a965f std::__call_once_param::operator()[abi:v15007] mutex:619 
0x55556e3a965f std::__call_once_proxy[abi:v15007]<…> mutex:655 
0x555578478ce9 std::__call_once mutex.cpp:242 
0x55556e38c68b std::call_once[abi:v15007]<…> mutex:673 
0x55556e38c68b DB::callOnce<…> callOnce.h:13 
0x55556e38c68b DB::Context::getMessageBrokerSchedulePool const Context.cpp:3474 
0x5555665c2c86 $_6::operator() const Server.cpp:1699 
0x5555665c0bca std::__invoke[abi:v15007]<…> invoke.h:394 
0x5555665c0bca std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x5555665c0bca std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x5555665c0bca std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555571cf6fd1 std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555571cf6fd1 std::function::operator() const function.h:1197 
0x555571cf6fd1 DB::ConfigReloader::reloadIfNewer ConfigReloader.cpp:164 
0x555571cf576a DB::ConfigReloader::ConfigReloader ConfigReloader.cpp:30 
0x5555665a198e std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x5555665a198e DB::Server::main Server.cpp:1542 
0x55557546959d Poco::Util::Application::run Application.cpp:315 
0x555566590a1a DB::Server::run Server.cpp:416 
0x55557548690e Poco::Util::ServerApplication::run ServerApplication.cpp:131 
0x55556658dc1b mainEntryClickHouseServer Server.cpp:222 
0x55555dfe0e18 main main.cpp:248 
Data race in free(nle->scheduler);
Write of size 8 at 0x721800522dd8 by thread T548 (mutexes: write M0):
0x55555df57416 free tsan_interceptors_posix.cpp:725 
0x55557070ec14 finalCloseCb libuv.h:255 
0x555577292731 uv__finish_close core.c:342 
0x555577292731 uv__run_closing_handles core.c:356 
0x555577292731 uv_run core.c:445 
0x55557070d4ec DB::NATSHandler::runLoop NATSHandler.cpp:72 
0x55557079cd07 $_0::operator() const NATSProducer.cpp:37 
0x55557079cd07 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55557079cd07 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x55557079cd07 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x55557079cd07 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x55556d69cf56 std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x55556d69cf56 std::function::operator() const function.h:1197 
0x55556d69cf56 DB::BackgroundSchedulePoolTaskInfo::execute BackgroundSchedulePool.cpp:106 
0x55556d6a0da4 DB::BackgroundSchedulePool::threadFunction BackgroundSchedulePool.cpp:304 
0x55556d6a1633 $_0::operator() const BackgroundSchedulePool.cpp:170 
0x55556d6a1633 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556d6a1633 (clickhouse+0x1814d632) tuple:1789 
0x55556d6a1633 std::apply[abi:v15007]<…> tuple:1798 
0x55556d6a1633 <lambda>::operator() ThreadPool.h:311 
0x55556d6a1633 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556d6a1633 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x55556d6a1633 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x55556d6a1633 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555566480daa std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555566480daa std::function::operator() const function.h:1197 
0x555566480daa ThreadPoolImpl::ThreadFromThreadPool::worker ThreadPool.cpp:785 
0x555566488e8e std::__invoke[abi:v15007]<…> invoke.h:359 
0x555566488e8e std::__thread_execute[abi:v15007]<…> thread:284 
0x555566488e8e std::__thread_proxy[abi:v15007]<…> thread:295 

Previous atomic read of size 4 at 0x721800522ddc by thread T667 (mutexes: write M1):
0x5555772914ab uv_async_send async.c:73 
0x55557070c8d7 uvScheduleToEventLoop libuv.h:135 
0x55557070c8d7 natsLibuv_Detach libuv.h:496 
0x5555750f5b2f _close conn.c:2653 
0x5555750f7a32 natsConn_close conn.c:4036 
0x5555750f7a32 natsConnection_Close conn.c:4054 
0x5555707128f4 DB::NATSConnection::disconnectImpl NATSConnection.cpp:147 
0x5555707128f4 DB::NATSConnection::disconnect NATSConnection.cpp:108 
0x55557079b6d0 DB::NATSProducer::finishImpl NATSProducer.cpp:70 
0x5555700e9d0b DB::AsynchronousMessageProducer::finish IMessageProducer.cpp:55 
0x55557010d8f6 DB::MessageQueueSink::onFinish MessageQueueSink.cpp:41 
0x5555715d7991 $_3::operator() const ExceptionKeepingTransform.cpp:186 
0x5555715d7991 std::__invoke[abi:v15007]<…> invoke.h:394 
0x5555715d7991 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x5555715d7991 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x5555715d7991 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x5555715d7336 std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x5555715d7336 std::function::operator() const function.h:1197 
0x5555715d7336 DB::runStep ExceptionKeepingTransform.cpp:114 
0x5555715d6d61 DB::ExceptionKeepingTransform::work ExceptionKeepingTransform.cpp:186 
0x555571240ce6 DB::executeJob ExecutionThreadContext.cpp:47 
0x555571240ce6 DB::ExecutionThreadContext::executeTask ExecutionThreadContext.cpp:96 
0x55557123035f DB::PipelineExecutor::executeStepImpl PipelineExecutor.cpp:288 
0x5555712308f0 DB::PipelineExecutor::executeSingleThread PipelineExecutor.cpp:254 
0x55557122ee3e DB::PipelineExecutor::executeImpl PipelineExecutor.cpp:430 
0x55557122e6aa DB::PipelineExecutor::execute PipelineExecutor.cpp:125 
0x55557122d3fe DB::threadFunction CompletedPipelineExecutor.cpp:49 
0x55557122d3fe $_0::operator() const CompletedPipelineExecutor.cpp:89 
0x55557122d3fe std::__invoke[abi:v15007]<…> invoke.h:394 
0x55557122d3fe (clickhouse+0x1bcd93fd) tuple:1789 
0x55557122d3fe std::apply[abi:v15007]<…> tuple:1798 
0x55557122d3fe <lambda>::operator() ThreadPool.h:311 
0x55557122d3fe std::__invoke[abi:v15007]<…> invoke.h:394 
0x55557122d3fe std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x55557122d3fe std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x55557122d3fe std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555566480daa std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555566480daa std::function::operator() const function.h:1197 
0x555566480daa ThreadPoolImpl::ThreadFromThreadPool::worker ThreadPool.cpp:785 
0x555566488e8e std::__invoke[abi:v15007]<…> invoke.h:359 
0x555566488e8e std::__thread_execute[abi:v15007]<…> thread:284 
0x555566488e8e std::__thread_proxy[abi:v15007]<…> thread:295 

Mutex M0 (0x7234004e2358) created at:
0x55555df5a55c pthread_mutex_lock tsan_interceptors_posix.cpp:1342 
0x55557847819a std::__libcpp_mutex_lock[abi:v15007] __threading_support:304 
0x55557847819a std::mutex::lock mutex.cpp:38 
0x55556d69c9ea std::lock_guard::lock_guard[abi:v15007] __mutex_base:94 
0x55556d69c9ea DB::BackgroundSchedulePoolTaskInfo::deactivate BackgroundSchedulePool.cpp:51 
0x55557079b1cd DB::NATSProducer::NATSProducer NATSProducer.cpp:38 
0x5555706fecfa std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x5555706fecfa DB::StorageNATS::write StorageNATS.cpp:391 
0x55556ee85b42 DB::InterpreterInsertQuery::buildSink InterpreterInsertQuery.cpp:353 
0x55556ee867e0 DB::InterpreterInsertQuery::buildPreAndSinkChains InterpreterInsertQuery.cpp:438 
0x55556ee89043 DB::InterpreterInsertQuery::buildInsertSelectPipeline InterpreterInsertQuery.cpp:636 
0x55556ee8c82f DB::InterpreterInsertQuery::execute InterpreterInsertQuery.cpp:805 
0x55556f30f0a1 DB::executeQueryImpl executeQuery.cpp:1349 
0x55556f3097c2 DB::executeQuery executeQuery.cpp:1516 
0x555571122792 DB::TCPHandler::runImpl TCPHandler.cpp:587 
0x5555711405c8 DB::TCPHandler::run TCPHandler.cpp:2519 
0x555575435983 Poco::Net::TCPServerConnection::start TCPServerConnection.cpp:43 
0x5555754361d2 Poco::Net::TCPServerDispatcher::run TCPServerDispatcher.cpp:115 
0x5555753b17a3 Poco::PooledThread::run ThreadPool.cpp:205 
0x5555753afc90 Poco::RunnableHolder::run Thread.cpp:45 
0x5555753ae32a Poco::ThreadImpl::runnableEntry Thread_POSIX.cpp:335 

Mutex M1 (0x724401880b68) created at:
0x55555df5a55c pthread_mutex_lock tsan_interceptors_posix.cpp:1342 
0x55557847819a std::__libcpp_mutex_lock[abi:v15007] __threading_support:304 
0x55557847819a std::mutex::lock mutex.cpp:38 
0x555570711b45 std::lock_guard::lock_guard[abi:v15007] __mutex_base:94 
0x555570711b45 DB::NATSConnection::connect NATSConnection.cpp:100 
0x55557070eed9 $_0::operator() const NATSHandler.cpp:116 
0x55557070eed9 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55557070eed9 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x55557070eed9 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x55557070eed9 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x55557070d49a std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x55557070d49a std::function::operator() const function.h:1197 
0x55557070d49a DB::NATSHandler::runLoop NATSHandler.cpp:68 
0x55557079cd07 $_0::operator() const NATSProducer.cpp:37 
0x55557079cd07 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55557079cd07 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x55557079cd07 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x55557079cd07 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x55556d69cf56 std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x55556d69cf56 std::function::operator() const function.h:1197 
0x55556d69cf56 DB::BackgroundSchedulePoolTaskInfo::execute BackgroundSchedulePool.cpp:106 
0x55556d6a0da4 DB::BackgroundSchedulePool::threadFunction BackgroundSchedulePool.cpp:304 
0x55556d6a1633 $_0::operator() const BackgroundSchedulePool.cpp:170 
0x55556d6a1633 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556d6a1633 (clickhouse+0x1814d632) tuple:1789 
0x55556d6a1633 std::apply[abi:v15007]<…> tuple:1798 
0x55556d6a1633 <lambda>::operator() ThreadPool.h:311 
0x55556d6a1633 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556d6a1633 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x55556d6a1633 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x55556d6a1633 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555566480daa std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555566480daa std::function::operator() const function.h:1197 
0x555566480daa ThreadPoolImpl::ThreadFromThreadPool::worker ThreadPool.cpp:785 
0x555566488e8e std::__invoke[abi:v15007]<…> invoke.h:359 
0x555566488e8e std::__thread_execute[abi:v15007]<…> thread:284 
0x555566488e8e std::__thread_proxy[abi:v15007]<…> thread:295 

Thread T548 'BgMBSchPool' (tid=3183578, running) created by main thread at:
0x55555df5885c pthread_create tsan_interceptors_posix.cpp:1023 
0x55556648146a std::__libcpp_thread_create[abi:v15007] __threading_support:376 
0x55556648146a std::thread::thread<…> thread:311 
0x55556648014a ThreadPoolImpl::ThreadFromThreadPool::ThreadFromThreadPool ThreadPool.cpp:598 
0x555566481e92 std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x555566481e92 ThreadPoolImpl::scheduleImpl<…> ThreadPool.cpp:284 
0x555566482ab7 ThreadPoolImpl::scheduleOrThrow ThreadPool.cpp:490 
0x55556d69e442 ThreadFromGlobalPoolImpl::ThreadFromGlobalPoolImpl<…> ThreadPool.h:278 
0x55556d69e442 DB::BackgroundSchedulePool::BackgroundSchedulePool BackgroundSchedulePool.cpp:170 
0x55556e3a965f std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x55556e3a965f $_0::operator() const Context.cpp:3475 
0x55556e3a965f std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556e3a965f (clickhouse+0x18e5565e) mutex:627 
0x55556e3a965f std::__call_once_param::operator()[abi:v15007] mutex:619 
0x55556e3a965f std::__call_once_proxy[abi:v15007]<…> mutex:655 
0x555578478ce9 std::__call_once mutex.cpp:242 
0x55556e38c68b std::call_once[abi:v15007]<…> mutex:673 
0x55556e38c68b DB::callOnce<…> callOnce.h:13 
0x55556e38c68b DB::Context::getMessageBrokerSchedulePool const Context.cpp:3474 
0x5555665c2c86 $_6::operator() const Server.cpp:1699 
0x5555665c0bca std::__invoke[abi:v15007]<…> invoke.h:394 
0x5555665c0bca std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x5555665c0bca std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x5555665c0bca std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555571cf6fd1 std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555571cf6fd1 std::function::operator() const function.h:1197 
0x555571cf6fd1 DB::ConfigReloader::reloadIfNewer ConfigReloader.cpp:164 
0x555571cf576a DB::ConfigReloader::ConfigReloader ConfigReloader.cpp:30 
0x5555665a198e std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x5555665a198e DB::Server::main Server.cpp:1542 
0x55557546959d Poco::Util::Application::run Application.cpp:315 
0x555566590a1a DB::Server::run Server.cpp:416 
0x55557548690e Poco::Util::ServerApplication::run ServerApplication.cpp:131 
0x55556658dc1b mainEntryClickHouseServer Server.cpp:222 
0x55555dfe0e18 main main.cpp:248 

Thread T667 'QueryCompPipeEx' (tid=3183697, running) created by main thread at:
0x55555df5885c pthread_create tsan_interceptors_posix.cpp:1023 
0x55556648146a std::__libcpp_thread_create[abi:v15007] __threading_support:376 
0x55556648146a std::thread::thread<…> thread:311 
0x55556648014a ThreadPoolImpl::ThreadFromThreadPool::ThreadFromThreadPool ThreadPool.cpp:598 
0x555566481e92 std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x555566481e92 ThreadPoolImpl::scheduleImpl<…> ThreadPool.cpp:284 
0x555566484a17 ThreadPoolImpl::scheduleOrThrow ThreadPool.cpp:490 
0x555566484a17 ThreadFromGlobalPoolImpl::ThreadFromGlobalPoolImpl<…> ThreadPool.h:278 
0x5555664832fc ThreadPoolImpl::ThreadFromThreadPool::ThreadFromThreadPool ThreadPool.cpp:598 
0x555566485752 std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x555566485752 ThreadPoolImpl::scheduleImpl<…> ThreadPool.cpp:284 
0x5555664855ab ThreadPoolImpl::scheduleOrThrowOnError ThreadPool.cpp:478 
0x55556660eb3f DB::AsyncLoader::spawn AsyncLoader.cpp:879 
0x55556660b422 DB::AsyncLoader::enqueue AsyncLoader.cpp:715 
0x555566608a1d DB::AsyncLoader::schedule AsyncLoader.cpp:372 
0x555566609813 DB::AsyncLoader::schedule AsyncLoader.cpp:309 
0x5555665cae64 DB::scheduleLoad AsyncLoader.h:541 
0x5555665cae64 DB::waitLoad AsyncLoader.h:559 
0x5555665cae64 DB::waitLoad AsyncLoader.h:596 
0x55556f340085 DB::loadMetadataSystem loadMetadata.cpp:526 
0x5555665a584e DB::Server::main Server.cpp:2098 
0x55557546959d Poco::Util::Application::run Application.cpp:315 
0x555566590a1a DB::Server::run Server.cpp:416 
0x55557548690e Poco::Util::ServerApplication::run ServerApplication.cpp:131 
0x55556658dc1b mainEntryClickHouseServer Server.cpp:222 
0x55555dfe0e18 main main.cpp:248 
Data race in free(nle);
Write of size 8 at 0x72100159aa58 by thread T546 (mutexes: write M0):
0x55555df57416 free tsan_interceptors_posix.cpp:725 
0x55557070ec42 finalCloseCb libuv.h:258 
0x555577292731 uv__finish_close core.c:342 
0x555577292731 uv__run_closing_handles core.c:356 
0x555577292731 uv_run core.c:445 
0x55557070d4ec DB::NATSHandler::runLoop NATSHandler.cpp:72 
0x5555707039e7 $_0::operator() const StorageNATS.cpp:104 
0x5555707039e7 std::__invoke[abi:v15007]<…> invoke.h:394 
0x5555707039e7 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x5555707039e7 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x5555707039e7 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x55556d69cf56 std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x55556d69cf56 std::function::operator() const function.h:1197 
0x55556d69cf56 DB::BackgroundSchedulePoolTaskInfo::execute BackgroundSchedulePool.cpp:106 
0x55556d6a0da4 DB::BackgroundSchedulePool::threadFunction BackgroundSchedulePool.cpp:304 
0x55556d6a1633 $_0::operator() const BackgroundSchedulePool.cpp:170 
0x55556d6a1633 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556d6a1633 (clickhouse+0x1814d632) tuple:1789 
0x55556d6a1633 std::apply[abi:v15007]<…> tuple:1798 
0x55556d6a1633 <lambda>::operator() ThreadPool.h:311 
0x55556d6a1633 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556d6a1633 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x55556d6a1633 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x55556d6a1633 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555566480daa std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555566480daa std::function::operator() const function.h:1197 
0x555566480daa ThreadPoolImpl::ThreadFromThreadPool::worker ThreadPool.cpp:785 
0x555566488e8e std::__invoke[abi:v15007]<…> invoke.h:359 
0x555566488e8e std::__thread_execute[abi:v15007]<…> thread:284 
0x555566488e8e std::__thread_proxy[abi:v15007]<…> thread:295 

Previous read of size 8 at 0x72100159aa58 by thread T2 (mutexes: write M1, write M2):
0x55557070c8ce uvScheduleToEventLoop libuv.h:135 
0x55557070c8ce natsLibuv_Detach libuv.h:496 
0x5555750f5b2f _close conn.c:2653 
0x5555750f7a32 natsConn_close conn.c:4036 
0x5555750f7a32 natsConnection_Close conn.c:4054 
0x5555707128f4 DB::NATSConnection::disconnectImpl NATSConnection.cpp:147 
0x5555707128f4 DB::NATSConnection::disconnect NATSConnection.cpp:108 
0x5555706ffdcb DB::StorageNATS::shutdown StorageNATS.cpp:450 
0x55556eac4a24 DB::IStorage::flushAndShutdown IStorage.h:597 
0x55556eac4a24 DB::InterpreterDropQuery::executeToTableImpl InterpreterDropQuery.cpp:305 
0x55556eac260c DB::InterpreterDropQuery::executeToTable InterpreterDropQuery.cpp:122 
0x55556eac1abe DB::InterpreterDropQuery::executeSingleDropQuery InterpreterDropQuery.cpp:95 
0x55556eac1616 DB::InterpreterDropQuery::execute InterpreterDropQuery.cpp:76 
0x55556f30f0a1 DB::executeQueryImpl executeQuery.cpp:1349 
0x55556f3097c2 DB::executeQuery executeQuery.cpp:1516 
0x555571122792 DB::TCPHandler::runImpl TCPHandler.cpp:587 
0x5555711405c8 DB::TCPHandler::run TCPHandler.cpp:2519 
0x555575435983 Poco::Net::TCPServerConnection::start TCPServerConnection.cpp:43 
0x5555754361d2 Poco::Net::TCPServerDispatcher::run TCPServerDispatcher.cpp:115 
0x5555753b17a3 Poco::PooledThread::run ThreadPool.cpp:205 
0x5555753afc90 Poco::RunnableHolder::run Thread.cpp:45 
0x5555753ae32a Poco::ThreadImpl::runnableEntry Thread_POSIX.cpp:335 
Mutex M0 (0x723400729618) created at:
0x55555df5a55c pthread_mutex_lock tsan_interceptors_posix.cpp:1342 
0x55557847819a std::__libcpp_mutex_lock[abi:v15007] __threading_support:304 
0x55557847819a std::mutex::lock mutex.cpp:38 
0x55556d69c9ea std::lock_guard::lock_guard[abi:v15007] __mutex_base:94 
0x55556d69c9ea DB::BackgroundSchedulePoolTaskInfo::deactivate BackgroundSchedulePool.cpp:51 
0x5555706f9bdd DB::StorageNATS::StorageNATS StorageNATS.cpp:105 
0x55557070bc5c std::construct_at[abi:v15007]<…> construct_at.h:35 
0x55557070bc5c std::allocator_traits::construct[abi:v15007]<…> allocator_traits.h:298 
0x55557070bc5c std::__shared_ptr_emplace::__shared_ptr_emplace[abi:v15007]<…> shared_ptr.h:292 
0x55557070bc5c std::allocate_shared[abi:v15007]<…> shared_ptr.h:953 
0x5555707041b0 std::make_shared[abi:v15007]<…> shared_ptr.h:962 
0x5555707041b0 $_0::operator() const StorageNATS.cpp:740 
0x5555707041b0 std::__invoke[abi:v15007]<…> invoke.h:394 
0x5555707041b0 std::__invoke_void_return_wrapper::__call<…> invoke.h:470 
0x5555707041b0 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x5555707041b0 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555570173b38 std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555570173b38 std::function::operator() const function.h:1197 
0x555570173b38 DB::StorageFactory::get const StorageFactory.cpp:231 
0x55556eaa33c5 DB::InterpreterCreateQuery::doCreateTable InterpreterCreateQuery.cpp:1859 
0x55556ea9c3ad DB::InterpreterCreateQuery::createTable InterpreterCreateQuery.cpp:1675 
0x55556eaa92ce DB::InterpreterCreateQuery::execute InterpreterCreateQuery.cpp:2219 
0x55556f30f0a1 DB::executeQueryImpl executeQuery.cpp:1349 
0x55556f3097c2 DB::executeQuery executeQuery.cpp:1516 
0x555571122792 DB::TCPHandler::runImpl TCPHandler.cpp:587 
0x5555711405c8 DB::TCPHandler::run TCPHandler.cpp:2519 
0x555575435983 Poco::Net::TCPServerConnection::start TCPServerConnection.cpp:43 
0x5555754361d2 Poco::Net::TCPServerDispatcher::run TCPServerDispatcher.cpp:115 
0x5555753b17a3 Poco::PooledThread::run ThreadPool.cpp:205 
0x5555753afc90 Poco::RunnableHolder::run Thread.cpp:45 
0x5555753ae32a Poco::ThreadImpl::runnableEntry Thread_POSIX.cpp:335 
Mutex M1 (0x720c01e78a20) created at:
0x55555df5a55c pthread_mutex_lock tsan_interceptors_posix.cpp:1342 
0x55557847819a std::__libcpp_mutex_lock[abi:v15007] __threading_support:304 
0x55557847819a std::mutex::lock mutex.cpp:38 
0x55556e471a4a std::unique_lock::unique_lock[abi:v15007] __mutex_base:122 
0x55556e471a4a DB::DDLGuard::DDLGuard DatabaseCatalog.cpp:1858 
0x55556e45f81b std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x55556e45f81b DB::DatabaseCatalog::getDDLGuard DatabaseCatalog.cpp:916 
0x55556eac30fe DB::InterpreterDropQuery::executeToTableImpl InterpreterDropQuery.cpp:146 
0x55556eac260c DB::InterpreterDropQuery::executeToTable InterpreterDropQuery.cpp:122 
0x55556eac1abe DB::InterpreterDropQuery::executeSingleDropQuery InterpreterDropQuery.cpp:95 
0x55556eac1616 DB::InterpreterDropQuery::execute InterpreterDropQuery.cpp:76 
0x55556f30f0a1 DB::executeQueryImpl executeQuery.cpp:1349 
0x55556f3097c2 DB::executeQuery executeQuery.cpp:1516 
0x555571122792 DB::TCPHandler::runImpl TCPHandler.cpp:587 
0x5555711405c8 DB::TCPHandler::run TCPHandler.cpp:2519 
0x555575435983 Poco::Net::TCPServerConnection::start TCPServerConnection.cpp:43 
0x5555754361d2 Poco::Net::TCPServerDispatcher::run TCPServerDispatcher.cpp:115 
0x5555753b17a3 Poco::PooledThread::run ThreadPool.cpp:205 
0x5555753afc90 Poco::RunnableHolder::run Thread.cpp:45 
0x5555753ae32a Poco::ThreadImpl::runnableEntry Thread_POSIX.cpp:335 
Mutex M2 (0x724401830ca8) created at:
0x55555df5a55c pthread_mutex_lock tsan_interceptors_posix.cpp:1342 
0x55557847819a std::__libcpp_mutex_lock[abi:v15007] __threading_support:304 
0x55557847819a std::mutex::lock mutex.cpp:38 
0x555570711b45 std::lock_guard::lock_guard[abi:v15007] __mutex_base:94 
0x555570711b45 DB::NATSConnection::connect NATSConnection.cpp:100 
0x55557070eed9 $_0::operator() const NATSHandler.cpp:116 
0x55557070eed9 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55557070eed9 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x55557070eed9 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x55557070eed9 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x55557070d49a std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x55557070d49a std::function::operator() const function.h:1197 
0x55557070d49a DB::NATSHandler::runLoop NATSHandler.cpp:68 
0x5555707039e7 $_0::operator() const StorageNATS.cpp:104 
0x5555707039e7 std::__invoke[abi:v15007]<…> invoke.h:394 
0x5555707039e7 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x5555707039e7 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x5555707039e7 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x55556d69cf56 std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x55556d69cf56 std::function::operator() const function.h:1197 
0x55556d69cf56 DB::BackgroundSchedulePoolTaskInfo::execute BackgroundSchedulePool.cpp:106 
0x55556d6a0da4 DB::BackgroundSchedulePool::threadFunction BackgroundSchedulePool.cpp:304 
0x55556d6a1633 $_0::operator() const BackgroundSchedulePool.cpp:170 
0x55556d6a1633 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556d6a1633 (clickhouse+0x1814d632) tuple:1789 
0x55556d6a1633 std::apply[abi:v15007]<…> tuple:1798 
0x55556d6a1633 <lambda>::operator() ThreadPool.h:311 
0x55556d6a1633 std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556d6a1633 std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x55556d6a1633 std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x55556d6a1633 std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555566480daa std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555566480daa std::function::operator() const function.h:1197 
0x555566480daa ThreadPoolImpl::ThreadFromThreadPool::worker ThreadPool.cpp:785 
0x555566488e8e std::__invoke[abi:v15007]<…> invoke.h:359 
0x555566488e8e std::__thread_execute[abi:v15007]<…> thread:284 
0x555566488e8e std::__thread_proxy[abi:v15007]<…> thread:295 

Thread T546 'BgMBSchPool' (tid=3183576, running) created by main thread at:
0x55555df5885c pthread_create tsan_interceptors_posix.cpp:1023 
0x55556648146a std::__libcpp_thread_create[abi:v15007] __threading_support:376 
0x55556648146a std::thread::thread<…> thread:311 
0x55556648014a ThreadPoolImpl::ThreadFromThreadPool::ThreadFromThreadPool ThreadPool.cpp:598 
0x555566481e92 std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x555566481e92 ThreadPoolImpl::scheduleImpl<…> ThreadPool.cpp:284 
0x555566482ab7 ThreadPoolImpl::scheduleOrThrow ThreadPool.cpp:490 
0x55556d69e442 ThreadFromGlobalPoolImpl::ThreadFromGlobalPoolImpl<…> ThreadPool.h:278 
0x55556d69e442 DB::BackgroundSchedulePool::BackgroundSchedulePool BackgroundSchedulePool.cpp:170 
0x55556e3a965f std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x55556e3a965f $_0::operator() const Context.cpp:3475 
0x55556e3a965f std::__invoke[abi:v15007]<…> invoke.h:394 
0x55556e3a965f (clickhouse+0x18e5565e) mutex:627 
0x55556e3a965f std::__call_once_param::operator()[abi:v15007] mutex:619 
0x55556e3a965f std::__call_once_proxy[abi:v15007]<…> mutex:655 
0x555578478ce9 std::__call_once mutex.cpp:242 
0x55556e38c68b std::call_once[abi:v15007]<…> mutex:673 
0x55556e38c68b DB::callOnce<…> callOnce.h:13 
0x55556e38c68b DB::Context::getMessageBrokerSchedulePool const Context.cpp:3474 
0x5555665c2c86 $_6::operator() const Server.cpp:1699 
0x5555665c0bca std::__invoke[abi:v15007]<…> invoke.h:394 
0x5555665c0bca std::__invoke_void_return_wrapper::__call<…> invoke.h:479 
0x5555665c0bca std::__function::__default_alloc_func::operator()[abi:v15007] function.h:235 
0x5555665c0bca std::__function::__policy_invoker::__call_impl<…> function.h:716 
0x555571cf6fd1 std::__function::__policy_func::operator()[abi:v15007] const function.h:848 
0x555571cf6fd1 std::function::operator() const function.h:1197 
0x555571cf6fd1 DB::ConfigReloader::reloadIfNewer ConfigReloader.cpp:164 
0x555571cf576a DB::ConfigReloader::ConfigReloader ConfigReloader.cpp:30 
0x5555665a198e std::make_unique[abi:v15007]<…> unique_ptr.h:714 
0x5555665a198e DB::Server::main Server.cpp:1542 
0x55557546959d Poco::Util::Application::run Application.cpp:315 
0x555566590a1a DB::Server::run Server.cpp:416 
0x55557548690e Poco::Util::ServerApplication::run ServerApplication.cpp:131 
0x55556658dc1b mainEntryClickHouseServer Server.cpp:222 
0x55555dfe0e18 main main.cpp:248 

Thread T2 'TCPHandler' (tid=3183032, running) created by main thread at:
0x55555df5885c pthread_create tsan_interceptors_posix.cpp:1023 
0x5555753adce8 Poco::ThreadImpl::startImpl Thread_POSIX.cpp:198 
0x5555753af6b0 Poco::Thread::start Thread.cpp:118 
0x5555753b1c34 Poco::PooledThread::start ThreadPool.cpp:91 
0x5555753b1c34 Poco::ThreadPool::ThreadPool ThreadPool.cpp:258 
0x555566597e8a DB::Server::main Server.cpp:920 
0x55557546959d Poco::Util::Application::run Application.cpp:315 
0x555566590a1a DB::Server::run Server.cpp:416 
0x55557548690e Poco::Util::ServerApplication::run ServerApplication.cpp:131 
0x55556658dc1b mainEntryClickHouseServer Server.cpp:222 
0x55555dfe0e18 main main.cpp:248 

@kozlovic
Copy link
Member Author

@dmitry-sles-novikov For the first data race, again, this is not a race with NATS library and libuv, not sure how ClickHouse has access to the file descriptor owned by the NATS C client.

For the other two, I will investigate further. I think it has to do with the fact that you run the loop with no wait. I need to run some experiments and will let you know. For the first, please check what ClickHouse is doing with the NATS C Client file descriptor.

@dmitry-sles-novikov
Copy link

dmitry-sles-novikov commented Oct 30, 2024

@kozlovic

For the first data race, again, this is not a race with NATS library and libuv, not sure how ClickHouse has access to the file descriptor owned by the NATS C client.

The file descriptor does not belong to the NATS C client, we closed it here and allowed it to be reused::

Location is file descriptor 82 destroyed by thread T546 at:
0x55555df5e8d3 close tsan_interceptors_posix.cpp:1828
0x5555750fca9d _closeFd comsock.c:39
0x5555750fca9d natsSock_Close comsock.c:45
0x5555750f7eeb natsConnection_ProcessCloseEvent conn.c:4192
0x55557070c623 uvPollUpdate libuv.h:191
0x55557070c623 uvAsyncCb libuv.h:314

From documentation:

close() closes a file descriptor, so that it no longer refers to any file and may be reused.

But at this point we still have an non closed poll handle associated with this fd. We will close the poll handle later in the attach or detach function:
0x55555df5fb47 epoll_ctl tsan_interceptors_posix.cpp:1958
0x5555772a0514 uv__platform_invalidate_fd linux.c:724
0x555577299865 uv__poll_stop poll.c:102
0x555577299865 uv__poll_close poll.c:153
0x555577291d2a uv_close core.c:199
0x55557070eb56 closeSchedulerCb libuv.h:266

static void closeSchedulerCb(uv_handle_t* scheduler)
{
    natsLibuvEvents *nle = (natsLibuvEvents*) scheduler->data;

    uv_close((uv_handle_t*) nle->handle, finalCloseCb);
}

static natsStatus
uvAsyncAttach(natsLibuvEvents *nle)
{
    natsStatus  s = NATS_OK;

    // We are reconnecting, destroy the old handle, create a new one
    if (nle->handle != NULL)
    {
        uv_close((uv_handle_t*) nle->handle, uvHandleClosedCb);
        nle->handle = NULL;
    }
    .....
}

void uv_close(uv_handle_t* handle, uv_close_cb close_cb) {
  assert(!uv__is_closing(handle));

  handle->flags |= UV_HANDLE_CLOSING;
  handle->close_cb = close_cb;

  switch (handle->type) {
  ...
  case UV_POLL:
    uv__poll_close((uv_poll_t*)handle);
    break;
  ...
  }

  uv__make_close_pending(handle);
}

void uv__poll_close(uv_poll_t* handle) {
  uv__poll_stop(handle);
}

static void uv__poll_stop(uv_poll_t* handle) {
  uv__io_stop(handle->loop,
              &handle->io_watcher,
              POLLIN | POLLOUT | UV__POLLRDHUP | UV__POLLPRI);
  uv__handle_stop(handle);
  uv__platform_invalidate_fd(handle->loop, handle->io_watcher.fd);
}

void uv__platform_invalidate_fd(uv_loop_t* loop, int fd) {
  uv__loop_internal_fields_t* lfields;
  struct uv__invalidate* inv;
  struct epoll_event dummy;
  int i;

  lfields = uv__get_internal_fields(loop);
  inv = lfields->inv;

  /* Invalidate events with same file descriptor */
  if (inv != NULL)
    for (i = 0; i < inv->nfds; i++)
      if (inv->events[i].data.fd == fd)
        inv->events[i].data.fd = -1;

  /* Remove the file descriptor from the epoll.
   * This avoids a problem where the same file description remains open
   * in another process, causing repeated junk epoll events.
   *
   * We pass in a dummy epoll_event, to work around a bug in old kernels.
   *
   * Work around a bug in kernels 3.10 to 3.19 where passing a struct that
   * has the EPOLLWAKEUP flag set generates spurious audit syslog warnings.
   */
  memset(&dummy, 0, sizeof(dummy));

  if (inv == NULL) {
    epoll_ctl(loop->backend_fd, EPOLL_CTL_DEL, fd, &dummy);
  } else {
    uv__epoll_ctl_prep(loop->backend_fd,
                       &lfields->ctl,
                       inv->prep,
                       EPOLL_CTL_DEL,
                       fd,
                       &dummy);
  }
}

handle->io_watcher.fd passed in uv__platform_invalidate_fd it is our file descriptor 82 created by NATS library and already closed in uvPollUpdate and maybe reused by OS.

Sanitizer detects how we pass an already closed and reused file descriptor into a system call along with another thread that is now the owner of the file descriptor 82

P.S. Maybe I failed to explain again and I give up =))))

@kozlovic
Copy link
Member Author

But at this point we still have an active poll handle associated with this fd

But the point of this PR was that I was doing a uv_poll_stop when removing the write/read events and only then closing the FD. As you can see from the code in uv_poll_stop, this is already invalidating the FD. So the uv_close of the handle in detach should not be doing anything. I mean, it sounds to me more like a problem with libuv then, or a doc issue? Meaning, maybe I am skipping a step that I should be doing after an uv_poll_stop and before the uv_close? Or maybe I should not do the uv_poll_stop for the final close and let that happen in the close callback of the handle?

P.S. Maybe I failed to explain again and I give up =))))

I understand your explanation, but I can't understand how this happen with the changes in this PR. Please don't give up on me and try to explain one more time ;-)

For the cases 2 and 3, I have an idea of what it could be, so will work on that.

@kozlovic
Copy link
Member Author

I guess an approach would be to do the uv_poll_stop followed by uv_close and then closing the socket? Or maybe uv_close is enough (no need for the stop I mean). I will check again libuv doc/code.

@dmitry-sles-novikov
Copy link

dmitry-sles-novikov commented Oct 30, 2024

As you can see from the code in uv_poll_stop, this is already invalidating the FD

You are right!!! I didn't pay attention.

uv_close is enough (no need for the stop I mean)

It looks like that

- Move the `uv_async_send` under our lock to avoid crash/race
- Replace `uv_poll_stop` with `uv_close` and deal with nle->handle
in place and not again in the final close callback.

Signed-off-by: Ivan Kozlovic <[email protected]>
@kozlovic
Copy link
Member Author

@dmitry-sles-novikov I just pushed a new commit that hopefully addresses the issue you have observed. I believe it will for sure for cases 2 and 3, and I hope solves also case 1. That one, I am not sure what I was doing before was wrong (since you see that stop is invalidating already), but at least it seem now more correct: when the connection is closed (or reconnecting) we are closing this handle (and closing this socket). Prior, we would have the stop (and close of socket) and then uv_close on final close.

If you could rebuild with this new commit and let me know how it looks it would be much appreciated. Thank you for your efforts and patience!

@dmitry-sles-novikov
Copy link

@kozlovic All tests passed, everything is great!

Thank you very much for your help and patience, I hope my help was useful. It was a pleasure to work with you

@kozlovic
Copy link
Member Author

All tests passed, everything is great!

This is great news!

Thank you very much for your help and patience, I hope my help was useful. It was a pleasure to work with you

Pleasure is mutual. And thank you for your persistence since that made me have a closer look at the way the final close was happening, which was wrong but I did not see that earlier.

@kozlovic
Copy link
Member Author

@levb PR is complete and @dmitry-sles-novikov confirmed that there is no longer any sanitizer issues. You can merge whenever you have the time.

@levb levb merged commit 297614a into main Nov 5, 2024
30 checks passed
@levb levb deleted the fix_814 branch November 5, 2024 17:49
github-actions bot pushed a commit that referenced this pull request Nov 5, 2024
@zenoway
Copy link

zenoway commented Nov 21, 2024

Hi, @levb! Could you please share plans on releasing this fix?

@levb
Copy link
Collaborator

levb commented Nov 22, 2024

@zenoway I am on a family vacation ATM, with limited access. I will cut the release as soon as I am back in December.

@zenoway
Copy link

zenoway commented Nov 22, 2024

@levb OK, thank you for the update!

PS. Have a great time on vacation! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File descriptor is closed before async callbacks are called from libuv adapter
4 participants