Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand spill logging #860

Open
5 tasks
pentschev opened this issue Feb 16, 2022 · 11 comments
Open
5 tasks

Expand spill logging #860

pentschev opened this issue Feb 16, 2022 · 11 comments

Comments

@pentschev
Copy link
Member

Lately there has been growing interest from users to be capable of gathering information from Dask-CUDA spilled data. Initially #442 added the possibility to log spilling times, that the user can query at will and get information on all spilling operations that happened. However, this is limited to the "default" spilling, and not present for on-demand/JIT-unspill. There's also no information other than total time spent per operation nor any examples on how to use it.

I believe it would be useful to have the following added:

  • Support for on-demand/JIT-unspill;
  • Information on how much data is being spilled;
  • Examples on using log spilling;
    • Bonus points for example using log spilling with PeriodicCallback;
  • Add tests.

cc @Matt711

@pentschev
Copy link
Member Author

Also pinging @ayushdg @jnke2016 @randerzander who may have other feature requests in mind.

@shwina
Copy link
Contributor

shwina commented Feb 16, 2022

FYI: I'm looking into the related problem of visualizing GPU spilling.

@pentschev
Copy link
Member Author

FYI: I'm looking into the related problem of visualizing GPU spilling.

You mean you want to visualize it but there's no way to do that, or there's a problem with the current visualizer (assuming there's one, TBH I don't know if there is)?

@Matt711
Copy link

Matt711 commented Feb 22, 2022

Keeping the conversation going. Hey, @shwina I talked with @pentschev about this issue. If I can assist you with a similar issue, I'd love to.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@Matt711
Copy link

Matt711 commented Mar 24, 2022

The issue is still in progress. I will begin working actively on it next week.

@Matt711
Copy link

Matt711 commented Apr 15, 2022

Will start working on this issue next week. I was busy with getting the Dask Operator ready for release.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@randerzander
Copy link
Contributor

We depend on jit unspilling in most workflows now.

In trying to determine the right amount of GPU memory for a given workload, we'd like to know how often we spill, and how much time is spent spilling. There's not a good way to gather this information currently without manually looking at workflow profiles.

Since our profiles are for a great many jobs, that becomes an inordinately time consuming process. It would be very useful for dask-cuda to log something like:
timestamp, worker_id, memory request size, spilled object size, time elapsed during spill

The above field names probably imply a misunderstanding about how spilling actually works, but I hope it conveys that with such information, we can programmatically find workloads that could be optimized to avoid spilling.

@madsbk
Copy link
Member

madsbk commented Aug 1, 2022

I have been planning to implement this for JIT unspilling for some time but now that we are introducing spilling in cuDF it might be sufficient to include spill logging in cuDF?

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@pentschev pentschev changed the title Expand log spilling Expand spill logging Oct 31, 2022
@caryr35 caryr35 added this to dask-cuda Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

5 participants