-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure Worker.close
#8062
Comments
An alternative to this would be to remove the worker from the scheduler ASAP, mimicking the case where a worker dies. Any necessary cluster-wide coordination would then be left to |
Fun fact: By stopping |
+1
this is done by setting the status. I don't think we should do anything else at this stage. overall the proposed ordering makes sense to me. I think the PCs have to be stopped between 2 and 3. PCs do need comms and we should stop them before closing them. |
At the moment, the order of operations in
Worker.close
is somewhat random and does not appear intentionally structured.It reads roughly as follows:
self.status
toStatus.closing
PauseEvent
to prevent further tasks from being executed/gatheredself.periodic_callbacks
self.stop()
BaseWorker.close(timeout=...)
self.async_instructions
close_gracefully
close-stream
, then closing with a timeoutStatus.closed
ServerNode.close()
local_directory
fromsys.path
exit_stack
There are multiple issues with this:
As an alternative, I suggest the following rough order of operations:
self.status = Status.closing
In particular, add-ins such as plugins should be torn down while the worker is still functional to allow them to take arbitrary actions such as informing the scheduler or initiating communications with other workers.
Note:
Timeouts are also a problem but should be tackled in a dedicated PR (see #7318, #7320).
The text was updated successfully, but these errors were encountered: