-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Leaking Listener When Closing NodeClient #55676
Conversation
If a node client (or rather its underlying node) is closed then any executions on it will just quietly fail as happens in elastic#55660 via closing the nodes on the test thread and asynchroneously using a node client. Closes elastic#55660
Pinging @elastic/es-distributed (:Distributed/Network) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm undecided whether we should do this change. The shut down behavior of ES / transport service makes sense as it is, but the integration tests (with the node client) are directly reaching into the node's innards to execute stuff, with requests in particular not being routed through the transport service (relates also #55494).
@ywelsch annoyingly enough, the problem here is precisely the fact that this request goes through the We had this issue before (service does cleanup/exception-handling on the generic pool) in #46178 where we fixed it by waiting for all actions to finish when closing the affected service. This isn't really a fix here though because we simply have to do something about local requests to a shut-down transport service IMO. |
I was wondering if we should instead add something to the test infra in InternalTestCluster instead, so that we track all pending requests, and throw a NodeClosedException when the node is closed. |
We could do that via a |
Let's see what @tbrooks8 thinks.
It's not that a closed transport service is becoming a black hole for listeners when the transport service is shut down (we handle that correctly AFAICS), but that when the threadpools are shut down, there is no guarantee that all tasks on the thread pool (queued up or not) will have their listeners invoked. That's a guarantee we don't have today, and has nothing to do with the transport. |
Not for local node requests. A stopped transport will simply will always get to To be honest, the fact that we handle some exceptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If a node client (or rather its underlying node) is closed then any executions on it will just quietly fail as happens in elastic#55660 via closing the nodes on the test thread and asynchronously using a node client. Closes elastic#55660
If a node client (or rather its underlying node) is closed then any executions on it will just quietly fail as happens in #55660 via closing the nodes on the test thread and asynchronously using
a node client.
I also removed the
onRejection
implementation on the error handling runnable since it's dead code (generic pool never rejects).IMO, it's reasonable to handle the callback on the current thread when shutting down if only to fix tests and remove a dirty spot that loses listeners.
The only way you can really trigger a SO here is by rerunning requests in a hot loop which you shouldn't do in the first place. ES internally doesn't do that in its usages of the node client as far as I can see and we shouldn't do that in tests either (and don't as far as I can see).
Closes #55660