-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand spill logging #860
Comments
Also pinging @ayushdg @jnke2016 @randerzander who may have other feature requests in mind. |
FYI: I'm looking into the related problem of visualizing GPU spilling. |
You mean you want to visualize it but there's no way to do that, or there's a problem with the current visualizer (assuming there's one, TBH I don't know if there is)? |
Keeping the conversation going. Hey, @shwina I talked with @pentschev about this issue. If I can assist you with a similar issue, I'd love to. |
This issue has been labeled |
The issue is still in progress. I will begin working actively on it next week. |
Will start working on this issue next week. I was busy with getting the Dask Operator ready for release. |
This issue has been labeled |
We depend on jit unspilling in most workflows now. In trying to determine the right amount of GPU memory for a given workload, we'd like to know how often we spill, and how much time is spent spilling. There's not a good way to gather this information currently without manually looking at workflow profiles. Since our profiles are for a great many jobs, that becomes an inordinately time consuming process. It would be very useful for dask-cuda to log something like: The above field names probably imply a misunderstanding about how spilling actually works, but I hope it conveys that with such information, we can programmatically find workloads that could be optimized to avoid spilling. |
I have been planning to implement this for JIT unspilling for some time but now that we are introducing spilling in cuDF it might be sufficient to include spill logging in cuDF? |
This issue has been labeled |
Lately there has been growing interest from users to be capable of gathering information from Dask-CUDA spilled data. Initially #442 added the possibility to log spilling times, that the user can query at will and get information on all spilling operations that happened. However, this is limited to the "default" spilling, and not present for on-demand/JIT-unspill. There's also no information other than total time spent per operation nor any examples on how to use it.
I believe it would be useful to have the following added:
PeriodicCallback
;cc @Matt711
The text was updated successfully, but these errors were encountered: