-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor SchedulerState
from Scheduler
#4365
Conversation
d7314c8
to
62b85d1
Compare
6b7882d
to
fc807ca
Compare
Should add that after doing this, I can see a path to typing all attributes of the |
a71b4d3
to
4e5b41c
Compare
FYI, I'm slammed today and tomorrow. I'm not going to be very responsive
this week.
…On Tue, Dec 15, 2020 at 7:35 AM jakirkham ***@***.***> wrote:
Requires PR ( #4343 <#4343> )
This refactors out the SchedulerState as @cclass including info about
endpoints and tasks. Also includes the transition_* functions. This
should make it easier to more thoroughly Cythonize these components.
------------------------------
You can view, comment on, or merge this pull request online at:
#4365
Commit Summary
- Move `worker_send` into transition functions
- Refactor `_task_to_msg` from `send_task_to_worker`
- Move `report` out of `_add_to_memory`
- Refactor out `_client_releases_keys`
- Collect client recs in `_add_to_memory`
- Use `_client_releases_keys` in transitions
- Refactor out `_task_to_report_msg`
- Collect and send worker messages from transitions
- Handle `report` in `transition`
- Add method to send a message to a specific client
- Add `_task_to_client_msgs`
- Replace `report_msg` with `client_msgs`
- Create empty `SchedulerState` class
- Move `transition*` methods into `SchedulerState`
- Add attributes for `SchedulerState`
- Initialize attributes in `SchedulerState`
- Pass arguments to `super` class
- Use `SchedulerState` attributes
- Use `cast` to access parent class attributes
- Drop no longer needed `cast`s & local assignments
- Use `dict` views onto `SortedDict` where possible
- Add ***@***.***`s for `SchedulerState` attributes
File Changes
- *M* distributed/core.py
<https://github.com/dask/distributed/pull/4365/files#diff-581957c552b88dd04319efca1429c0c0827daff509a115542e70b661c4c04914>
(3)
- *M* distributed/scheduler.py
<https://github.com/dask/distributed/pull/4365/files#diff-bbcf2e505bf2f9dd0dc25de4582115ee4ed4a6e80997affc7b22122912cc6591>
(7323)
Patch Links:
- https://github.com/dask/distributed/pull/4365.patch
- https://github.com/dask/distributed/pull/4365.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4365>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTA5GJL3PJPP5U43Q6DSU564HANCNFSM4U4QQSPA>
.
|
c395eb1
to
8d5fa02
Compare
No worries. Just wanted to make sure you are aware of this. Mostly just interested in your general high-level thoughts as opposed to a detailed review at this stage. |
Some of the transition and supporting methods take |
Hmm...seems those distributed.core - ERROR - transition_processing_memory() got an unexpected keyword argument 'status'
Traceback (most recent call last):
File "/Users/jkirkham/Developer/distributed/distributed/core.py", line 592, in handle_stream
handler(**merge(extra, msg))
File "distributed/scheduler.py", line 4707, in distributed.scheduler.Scheduler.handle_task_finished
File "distributed/scheduler.py", line 4115, in distributed.scheduler.Scheduler.stimulus_task_finished
File "distributed/scheduler.py", line 6079, in distributed.scheduler.Scheduler.transition
File "distributed/scheduler.py", line 6015, in distributed.scheduler.Scheduler.transition
File "distributed/scheduler.py", line 2022, in distributed.scheduler.SchedulerState.transition_processing_memory
TypeError: transition_processing_memory() got an unexpected keyword argument 'status' |
After some toying around, I think these are coming from here for finished tasks: distributed/distributed/worker.py Lines 1920 to 1929 in c2d8773
Also from errored tasks this appears to come from here: distributed/distributed/worker.py Lines 1931 to 1938 in c2d8773
Perhaps the |
ec70047
to
c28de04
Compare
Here's the Scheduler call graph with these changes so far. Some functions (like Edit: We are spending less time in |
a109dbb
to
fc628fe
Compare
Yeah messages including unused arguments and extra arguments being passed to |
There's other odd things like |
e08031c
to
09fc584
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like the refactoring; encapsulate the state of the scheduler in a different class makes a lot of sense. Actually, I think we should go even further and encapsulate all states of the scheduler and tasks with clearly defined invariants but that is future work :)
I have read through all of the changes and as far as I can see, it looks good but because of the amount of changes I am not confident that it is bug free :)
As a side note I agree, we should get rid of all the unused **kwargs
in a future PR. It makes a harder to follow variables through the code.
@@ -230,6 +231,8 @@ def set_thread_ident(): | |||
|
|||
self.__stopped = False | |||
|
|||
super().__init__(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused, why do we need kwargs
? And call to super?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's fair. It's a little confusing.
The call to super
is needed to call the parent constructors of Scheduler
. These are ServerNode
and SchedulerState
. We also want to affect when these get called relative to other things in this constructor (as this constructor needs some of the attributes of the parent classes to be setup in later steps).
As Scheduler
inherits from two base classes, we need to make sure that arguments for those constructors get passed through and picked up by the right parent class. There are some more details in this SO answer that may help clarify this further.
distributed/scheduler.py
Outdated
except Exception as e: | ||
logger.exception(e) | ||
if LOG_PDB: | ||
import pdb | ||
|
||
################## | ||
# Administration # | ||
################## | ||
pdb.set_trace() | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we implement all the exception logging using a context manager as described in https://stackoverflow.com/a/28158006
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that something like that makes sense. Alternatively we might want to push this into transition
where all of these methods are called. Held off on that for now as I didn't want to disrupt the code too much, but agree this is something we should follow up on.
Thanks for looking through this Mads! 😄
Thanks Mads! 😄 Yeah for sure. I think this opens a lot of doors for us in terms of future optimizations. Just tried to find the right balance where we do enough to get some benefit without pushing everything into one PR.
Completely understand that. Have really been relying on the CI and tests to catch any issues. Also have been doing some profiling locally, which has helped as well. Plus regex has been my friend (most changes represent a common pattern).
Agreed. I tried to get rid of them in this PR, but it wound up being a little too complicated and touching too much other code. Also as we are still calling the |
Yeah, I have the same question. What do we include in SchedulerState and what do we include in Scheduler? I'm totally fine doing this in the future, but some methods, like |
Yeah that makes sense. Generally have been thinking anything that relates to the management of the task graph or transformations of it seem like good candidates for |
Refactored out a few more minor utility functions used by the |
FWIW here's the profile using Ben's shuffle benchmark with these changes. |
Just wondering, is it necessary that |
After having done the work it's actually not that clear to me that it would be that doable. |
for worker, msg in worker_msgs.items(): | ||
self.worker_send(worker, msg) | ||
for client, msg in client_msgs.items(): | ||
self.client_send(client, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not help any, but any thoughts on pulling these out of the transition
function into transitions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an interesting suggestion and agree it's worth trying. That said, I feel like this PR has gotten quite long and am worried that has slowed down the reviewing of it. Instead would propose we focus on getting this in. That would make it easier to make smaller more focused PRs on moving communication further out and consolidating messages further as well as allowing us to more effectively Cythonize any remaining functions we still see popping up in the profiles. Thoughts? 🙂
(Should add if we want to do this after the patch release tomorrow that also makes sense to me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, we're good to go here. I expect that over time we'll move more things over, but I agree with @jakirkham that what is here now is a good firm base. |
Thanks Matt! 😄 |
FWIW moving I think Looked at |
Requires PR ( #4343 )
This refactors out the
SchedulerState
as@cclass
including info about endpoints and tasks. Also includes thetransition_*
functions. This should make it easier to more thoroughly Cythonize these components.