Fix compilation warnings, `decide_worker` now a C func, stealing improvements #4375

jakirkham · 2020-12-18T02:17:34Z

Use cython.compiled to skip pure Python fallback functions in Cython case (fixes unused symbol warnings from C compiler)
Move decide_worker function into C (removes overhead of Python API, which is unneeded)
Some optimization to stealing math and other misc. stealing improvements

Note: There are still some warnings that predate this PR and are not handled currently. See issue ( cython/cython#3474 ) for details.

jakirkham · 2020-12-18T02:25:05Z

@quasiben, this should fix some of the compile warnings we were seeing earlier 🙂

Use Cython's `compiled` flag to check whether the code is being compiled. If `cython` can't even be `import`ed, then set the flag to `False`. Then put all Cython bits in the `True` branch and all the Python bits under the `False` branch. This should help Cython bracket the Python fallbacks under the `False` case to optimize them out during compilation (as opposed to leaving a bunch of unused fallback functions around). As a result this should get rid of the unused symbol warnings seen with the C compiler previously.

Unlike `ccall`, which adds Python + C APIs for functions, `cfunc` adds only a C API for functions. This can be handy for internal functions used in `scheduler.py` only.

Tells Cython to add a C API for calls to this function. Should speed up calls to this function where a Python API is not needed and would add otherwise unnecessary overhead.

jakirkham · 2020-12-18T03:52:36Z

CI failures seem to be the same as those in issue ( #4374 ).

In Python 3.3+, `math` includes `log2`. So just use it directly instead of dividing by `log(2)`. This is twice as fast.

The default is already to round to `0` digits after the decimal point. It's also twice as fast to just leave the default as opposed to passing `0`.

It's 5x faster to just compare and assign to `level` if needed. So just do that.

jakirkham · 2020-12-18T06:36:31Z

distributed/stealing.py

@@ -134,8 +133,10 @@ def steal_time_ratio(self, ts):
        if cost_multiplier > 100:
            return None, None

-        level = int(round(log(cost_multiplier) / log_2 + 6, 0))
-        level = max(1, level)
+        level = int(round(log2(cost_multiplier) + 6))


log2 was added in Python 3.3+ and is faster than hand rolling by 2x

In [3]: %timeit log(3) / log(2) 401 ns ± 5.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit log(3, 2) 234 ns ± 2.72 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

round already rounds to 0 digits after the decimal place. Using the default is 2x faster than specifying one

In [7]: %timeit round(1.5, 0) 393 ns ± 2.58 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [8]: %timeit round(1.5) 195 ns ± 1.77 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

jakirkham · 2020-12-18T06:38:44Z

distributed/stealing.py

+        if level < 1:
+            level = 1


Using if is roughly an order of magnitude faster than max.

In [1]: %timeit max(1, 3) 222 ns ± 1.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [2]: %timeit max(1, 0) 222 ns ± 1.41 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [3]: %%timeit ...: L = 3 ...: if L < 1: ...: L = 1 ...: 25.5 ns ± 0.165 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [4]: %%timeit ...: L = 0 ...: if L < 1: ...: L = 1 ...: 29.7 ns ± 0.231 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

No objection here, but I do want us to look at the full picture rather than just micro-optimizing. 200ns may not matter in the grand scheme. At some point it might make more sense to keep code around that is nanoseconds slower because it is more readable.

I'm not making a judgement in this case (I haven't looked at this much). I just want to avoid us going overboard here.

Mainly came here since put_key_in_stealable takes as much runtime as transition_waiting_processing and transition_processing_memory. With the latter 2 we have already put a lot of time into optimizing them, but this part of the code has largely been untouched. So am trying to tackle some of the low hanging fruit here.

Should add that collectively these changes more than halve the time spent in put_key_in_stealable.

Glad to hear it. My comment was just general. I have no particular concern with this PR.

Understood. Just wanted to provide context :)

Besides we don't use any of the stuff until the 3rd case. So just move it later.

jakirkham · 2020-12-18T06:42:53Z

distributed/stealing.py

-        nbytes = ts.get_nbytes_deps()
-
-        transfer_time = nbytes / self.scheduler.bandwidth + LATENCY


We don't even use these until the 3rd if. So just move them later. We could already have returned before needing these.

This function gets called a lot and logging from makes it run kind of slow. So just drop it to remove this bottleneck.

jakirkham · 2020-12-18T22:33:11Z

Understand if this doesn't get looked at for a while. Just making sure you are aware 😉

jakirkham · 2020-12-19T00:44:41Z

Restarted CI now that issue ( #4374 ) is fixed

jakirkham · 2021-01-04T17:12:09Z

Planning to merge tomorrow if no comments.

quasiben · 2021-01-04T20:50:57Z

Thanks @jakirkham -- looks good

jakirkham · 2021-03-19T07:23:50Z

distributed/stealing.py

@@ -85,7 +84,6 @@ def put_key_in_stealable(self, ts):
        ws = ts.processing_on
        worker = ws.address
        cost_multiplier, level = self.steal_time_ratio(ts)
-        self.log(("add-stealable", ts.key, worker, level))


Doing the same thing for remove_key_from_stealable ( #4609 )

jakirkham added 3 commits December 17, 2020 18:46

Add cfunc for decorating C API only functions

cbcb7c5

Unlike `ccall`, which adds Python + C APIs for functions, `cfunc` adds only a C API for functions. This can be handy for internal functions used in `scheduler.py` only.

Decorate decide_func with @cfunc

62d865a

Tells Cython to add a C API for calls to this function. Should speed up calls to this function where a Python API is not needed and would add otherwise unnecessary overhead.

jakirkham force-pushed the misc_opt5 branch from 03d7cb5 to 62d865a Compare December 18, 2020 02:46

jakirkham added 2 commits December 17, 2020 22:16

Use log2 instead of hand rolling ourselves

6e1f2df

In Python 3.3+, `math` includes `log2`. So just use it directly instead of dividing by `log(2)`. This is twice as fast.

Drop ndigits argument from round

4595948

The default is already to round to `0` digits after the decimal point. It's also twice as fast to just leave the default as opposed to passing `0`.

jakirkham changed the title ~~Address unused symbol warnings & make decide_worker a C func~~ Address unused symbol warnings, make decide_worker a C func, stealing math improvements Dec 18, 2020

Just compare level to 1

9b3627c

It's 5x faster to just compare and assign to `level` if needed. So just do that.

jakirkham force-pushed the misc_opt5 branch from 1dda2ff to 9b3627c Compare December 18, 2020 06:33

jakirkham commented Dec 18, 2020

View reviewed changes

Only compute stuff after the trivial cases

8a5ab79

Besides we don't use any of the stuff until the 3rd case. So just move it later.

jakirkham commented Dec 18, 2020

View reviewed changes

Add a little whitespace

bcb606b

jakirkham changed the title ~~Address unused symbol warnings, make decide_worker a C func, stealing math improvements~~ Fix compilation warnings, decide_worker now a C func, stealing math improvements Dec 18, 2020

Drop log message in put_key_in_stealable

26ee18c

This function gets called a lot and logging from makes it run kind of slow. So just drop it to remove this bottleneck.

jakirkham changed the title ~~Fix compilation warnings, decide_worker now a C func, stealing math improvements~~ Fix compilation warnings, decide_worker now a C func, stealing improvements Dec 18, 2020

This was referenced Dec 18, 2020

Disabling logging in stealing #4359

Closed

Profile shuffle w/pending changes quasiben/dask-scheduler-performance#62

Open

jakirkham requested a review from mrocklin December 18, 2020 22:32

jakirkham requested a review from jrbourbeau December 22, 2020 00:04

jakirkham merged commit 76ef459 into dask:master Jan 5, 2021

jakirkham deleted the misc_opt5 branch January 5, 2021 18:08

jakirkham mentioned this pull request Mar 19, 2021

Drop log from remove_key_from_stealable #4609

Merged

3 tasks

jakirkham commented Mar 19, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix compilation warnings, `decide_worker` now a C func, stealing improvements #4375

Fix compilation warnings, `decide_worker` now a C func, stealing improvements #4375

jakirkham commented Dec 18, 2020 •

edited

Loading

jakirkham commented Dec 18, 2020

jakirkham commented Dec 18, 2020

jakirkham Dec 18, 2020

jakirkham Dec 18, 2020

mrocklin Jan 4, 2021

jakirkham Jan 4, 2021

jakirkham Jan 4, 2021

mrocklin Jan 4, 2021

jakirkham Jan 4, 2021

jakirkham Dec 18, 2020

jakirkham commented Dec 18, 2020

jakirkham commented Dec 19, 2020

jakirkham commented Jan 4, 2021

quasiben commented Jan 4, 2021

jakirkham Mar 19, 2021

		nbytes = ts.get_nbytes_deps()

		transfer_time = nbytes / self.scheduler.bandwidth + LATENCY

Fix compilation warnings, decide_worker now a C func, stealing improvements #4375

Fix compilation warnings, decide_worker now a C func, stealing improvements #4375

Conversation

jakirkham commented Dec 18, 2020 • edited Loading

jakirkham commented Dec 18, 2020

jakirkham commented Dec 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakirkham commented Dec 18, 2020

jakirkham commented Dec 19, 2020

jakirkham commented Jan 4, 2021

quasiben commented Jan 4, 2021

Choose a reason for hiding this comment

Fix compilation warnings, `decide_worker` now a C func, stealing improvements #4375

Fix compilation warnings, `decide_worker` now a C func, stealing improvements #4375

jakirkham commented Dec 18, 2020 •

edited

Loading