-
-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add idle time to fine performance metrics #7938
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 20 files ± 0 20 suites ±0 12h 45m 56s ⏱️ + 18m 52s For more details on these failures, see this check. Results for commit 092e9d7. ± Comparison against base commit 3de722a. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit, take it or leave it. Otherwise looks great. 👍
# Custom metrics can provide any hashable as the label | ||
activity = str(activity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Know this comment was here before, but think it's not helpful/misleading(?). str
doesn't need its input to be hashable.
# Custom metrics can provide any hashable as the label | |
activity = str(activity) | |
activity = str(activity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sort function shortly afterwards will break if you don't wrap that arbitrary hashable into a string. That comment line is explaining that activity is not necessarily a string, and as such it's important. I'm amending the comment not to mention hashable.
Add to fine performance metrics the delta between
(end-to-end-runtime * nthreads)
and the time spent by workers on tasks.Such delta does not increase when there are no tasks running anywhere on the cluster.
If you are observing multiple spans at once, e.g. all calls to a certain library, do not double-count overlapping time and do not count time when none of the selected spans are executing.
If you are cherry-picking specific spans, this delta may be in part caused by work stolen by other tasks.
Demo
After running the ML preprocessing notebook already featured in dask/community#301:
(I added 3 lines to dask/dask to isolate the I/O time: crusaderky/dask@0f36901)
Summarized insights I obtained from the dashboard:
The workflow currently features a whopping 67% waste in runtime.
Known issues
Note
I noticed that keeping the Fine Performance Metrics dashboard open while the computation is running is very CPU intensive for the scheduler. However, this seems to be a problem specific to Bokeh rendering; calling
FinePerformanceMetrics.update()
, which is invoked every 500ms, costs a modest ~2.5ms.CC @ntabris @milesgranger