-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File descriptor is closed before async callbacks are called from libuv adapter #814
Comments
Thanks in advance =) |
@dmitry-sles-novikov The detach (and finalCloseCb) would be just solving the normal connection close, but we also have to take into account the reconnects. I had a look and it will take a bit of time since we need to make sure to not reintroduce a FD leak that was solved with this code (see #375, #376). I did a bit of a POC and moved some socket close from NATS library to the libuv adapter itself (but then needs to be done to the libevent adapter too), and it seems ok, but again, need to spend more time on that one. |
@kozlovic Yes, you are right, I did not take into account the reconnect case. I wanted to give feedback on the founded errors. Thank you very much, I will wait for updates |
@dmitry-sles-novikov It just occurred to me: if you say that there is a double close of the socket, in the NATS library and libuv, then I should not have to close the socket in the adapter. However, I noticed that if I remove from the library and don't touch the adapter, then I see socket leaked (at least stay in CLOSE_WAIT) when the server is restarted. So I am a bit confused. Could it be that you or ClickHouse is doing a socket close or is it the libuv library itself? I would need to know that. I see from your other PR that you have a dedicated NATS libuv adapter (not using the one from this repo), so maybe that's where you were doing a socket close? |
@kozlovic I created the adapter before I understood the causes of the problems and plan to remove it in the future. It minimizes the likelihood of test failures, but does not fix the error. As far as I understood from the libuv documentation, uv_poll_t does not own the file descriptor, but only polls it. Therefore, responsibility for opening and closing the file descriptor lies with its owner. But it can be closed only after the socket polling is completed due to a call to uv_poll_stop() or uv_close(): From libuv tests uv_poll_t poll_handle;
int fd;
->fd = epoll_create(1);
ASSERT_NE(fd, -1);
->ASSERT_OK(uv_poll_init (uv_default_loop(), &poll_handle, fd));
ASSERT_OK(uv_poll_start(&poll_handle, UV_READABLE, (uv_poll_cb) abort));
ASSERT_NE(0, uv_run(uv_default_loop(), UV_RUN_NOWAIT));
->uv_close((uv_handle_t*) &poll_handle, NULL);
ASSERT_OK(uv_run(uv_default_loop(), UV_RUN_DEFAULT));
->ASSERT_OK(close(fd)); P.S. I will still edit PR in clickhouse and check it, including for double socket closing |
The socket was closed by the NATS library itself, which could cause some issue when, specifically libuv, could still be polling it. We now defer to the event loop adapter to make sure that the event loop library is done polling before invoking a new function that will take care of closing the socket. Resolves #814 Signed-off-by: Ivan Kozlovic <[email protected]>
The socket was closed by the NATS library itself, which could cause some issue when the event loop thread would still be polling it. We now defer to the event loop adapter to make sure that the event loop library is done polling before invoking a new function that will take care of closing the socket. I have updated the event loop test (that simulates what our adapters are doing). The mockup event loop implementation is a bit too simplistic but should be ok for now. If we have issues, we would have to make the events a linked list. Resolves #814 Signed-off-by: Ivan Kozlovic <[email protected]>
The socket was closed by the NATS library itself, which could cause some issue when the event loop thread would still be polling it. We now defer to the event loop adapter to make sure that the event loop library is done polling before invoking a new function that will take care of closing the socket. I have updated the event loop test (that simulates what our adapters are doing). The mockup event loop implementation is a bit too simplistic but should be ok for now. If we have issues, we would have to make the events a linked list. Resolves #814 Signed-off-by: Ivan Kozlovic <[email protected]>
@dmitry-sles-novikov Could you have a look at PR #815 and I would really appreciate if you could build from that branch (and use the new libuv adapter file) and see if that addresses the issue you have observed? Thanks! |
…ng (nats-io#815) * [FIXED] EventLoop: Socket now closed only after event loop done polling The socket was closed by the NATS library itself, which could cause some issue when the event loop thread would still be polling it. We now defer to the event loop adapter to make sure that the event loop library is done polling before invoking a new function that will take care of closing the socket. I have updated the event loop test (that simulates what our adapters are doing). The mockup event loop implementation is a bit too simplistic but should be ok for now. If we have issues, we would have to make the events a linked list. Resolves nats-io#814 Signed-off-by: Ivan Kozlovic <[email protected]> * Updates based on PR feedback - Move the `uv_async_send` under our lock to avoid crash/race - Replace `uv_poll_stop` with `uv_close` and deal with nle->handle in place and not again in the final close callback. Signed-off-by: Ivan Kozlovic <[email protected]> --------- Signed-off-by: Ivan Kozlovic <[email protected]>
Observed behavior
nats.c library are using uv_poll_init_socket in libuv adapter
From documentation:
But nats.c library close nc->sockCtx.fd before calling the detach callback
As a result, in the "good" case we have a race condition if the descriptor is assigned to another file, in the bad case the application terminates inside uv__epoll_ctl_flush function of the libuv library because cqe->read == -EBADF:
Expected behavior
File descriptor is closed after calling of
uv_close((uv_handle_t*) nle->handle, finalCloseCb);
, for example in finalCloseCbServer and client version
client version: 3.9.1 and early
Host environment
No response
Steps to reproduce
No response
The text was updated successfully, but these errors were encountered: