Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transition_processing_memory optimizations, etc. #4487

Merged
merged 50 commits into from
Feb 11, 2021

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Feb 6, 2021

  • Closes #xxxx
  • Tests added / passed
  • Passes black distributed / flake8 distributed

Goes through transition_processing_memory and optimizes it more carefully. In particular simplifies the code around starts tops, durations, and occupancies. Also types variables to get more performant Cython code. Makes use of intermediate variables to group work and allow easier reuse in later expressions. Tries to simplify data generated and collected to cutdown on general overhead associated. Additionally really cuts down on the work needed to produce messages for clients and workers.

Also includes various small optimizations like using .get(...), typing variables, dropping unneeded intermediate variables, constructing values once, using typed attributes, etc.

@jakirkham jakirkham force-pushed the misc_opts branch 2 times, most recently from 3a7151d to 7df87de Compare February 6, 2021 05:52
@jakirkham jakirkham changed the title [WIP] Miscellaneous optimizations [WIP] transition_processing_memory optimizations Feb 6, 2021
@jakirkham jakirkham force-pushed the misc_opts branch 2 times, most recently from fd0991e to 087e0f0 Compare February 6, 2021 06:27
@jakirkham jakirkham changed the title [WIP] transition_processing_memory optimizations transition_processing_memory optimizations Feb 6, 2021
@jakirkham jakirkham marked this pull request as ready for review February 6, 2021 06:42
@jakirkham
Copy link
Member Author

Planning on merging tomorrow if no comments

@jakirkham jakirkham changed the title transition_processing_memory optimizations transition_processing_memory optimizations, etc. Feb 9, 2021
@jakirkham jakirkham force-pushed the misc_opts branch 2 times, most recently from bf67cd1 to 63d109e Compare February 9, 2021 03:22
distributed/scheduler.py Outdated Show resolved Hide resolved
@jakirkham jakirkham force-pushed the misc_opts branch 2 times, most recently from b7a2894 to bc3f76a Compare February 9, 2021 03:44
@jakirkham
Copy link
Member Author

jakirkham commented Feb 9, 2021

Seeing an unrelated test failure. Am able to reproduce without these changes. Filed as issue ( #4493 )

Edit: This should be resolved now that PR ( dask/dask#7194 ) is in

@jakirkham jakirkham force-pushed the misc_opts branch 3 times, most recently from c70475b to df6d8a4 Compare February 10, 2021 01:37
@jakirkham
Copy link
Member Author

jakirkham commented Feb 10, 2021

Looks like CI has a different issue unrelated to this PR. Fixing with PR ( #4499 )

Edit: Restarting CI now that that fix is in

Edit 2: Seems like GH Actions doesn't do a merge commit when testing. So was still seeing the old failure. Went ahead and rebased. Should clear out that issue

Avoids checking for the value and then retrieving it by simply trying to
get the associated value. If it is not found, it is `None`, which is
fast to check after and handle.
Allows Cython to optimize operations on this object.
This should allow faster access (particularly in Cython).
This is a holdover from before the `SchedulerState` refactor. Now this
is properly typed and can be used efficiently in Cython anyways.
Avoids duplicate retrieval. Also handles the coercion to `double` once.
Avoid collecting an intermediate list of client keys and instead use
them to produce messages directly.
This makes it easier to type this value and thus for Cython to optimize
this value throughout.
Should knock out type checking when the function is called, which should
simplify the work needed in later steps.
This allows us to only create the `set` `s` when we know we need it.
Otherwise we just get the `set` previously contained in the `dict`.
Instead of creating an empty `set` when `s` is not defined, just check
that `s` is non-trivial (either `None` or empty). This should be a bit
faster and avoid the unnecessary creation step.
As `defaultdict`s are difficult for Cython to optimize and usually a
`dict` will suffice, change `_unknown_duractions` to a `dict` to allow
Cython to use Python C API specific to `dict`s.
This makes it easier to annotate these more accurately and benefit from
Cython's optimizations around these `dict`-typed variables. Requires a
very small amount of checking for keys and setting their values if not
present. Though this remains efficient in both Python & Cython as well
as compact.
These are annotated local variables, which are holdovers from before the
`SchedulerState` refactor. We can now drop these and use
`SchedulerState` to access these directly.
@jakirkham jakirkham force-pushed the misc_opts branch 3 times, most recently from 55e4b75 to eb3e6c3 Compare February 10, 2021 06:32
Cython will turn this into a very efficient `switch...case` statement.
So this cuts down on the overhead of comparisons and checks (paying them
once when computing the length). Then focuses on just checking this C
typed integral value for which `case` to run.
@jakirkham
Copy link
Member Author

Planning on merging tomorrow if no comments

@jakirkham jakirkham merged commit de7cf0a into dask:master Feb 11, 2021
@jakirkham jakirkham deleted the misc_opts branch February 11, 2021 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants