-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Metric for maximum GPU memory per task #6745
Comments
Thought about this issue a bit more, what I think we want is a version of the tracking_resource_adaptor but, rather than have a single map for all threads, I think that we want to keep track of the maximum outstanding GPU footprint per thread. Also to note, the main motivation here would be to figure out if our estimation on memory usage for some GPU code is higher than anticipated, to help us debug waste or inform heuristics to control what tasks we allow on the GPU. This should allow us to do the following:
In this scenario when we enter the If one of our allocations failed and we handled them via a spill it shouldn't matter. That is because the spill code should be careful to disable the tracking for those spills (e.g. a I hope/believe this could be a pretty low overhead system. Note this doesn't, I don't think, help tracking when an expensive kernel is loaded, as far as I understand that can be a one-time-penalty when we open the shared library. I know we have seen this with some of the regular expression kernels in the past. Pinging @jlowe on this overall for comments. |
I think one approach here is to have a stack of simple memory tracking info in RmmJni. When a When We also need to keep a set of addresses we allocated in this thread, unfortunately. Given spill, the current thread may need to spill to satisfy an allocation. It seems we could ignore frees that we didn't allocate while tracking. The hope is that these |
Nsys has added memory tracking capabilities as of late, and we believe we can use the correlationId + NVTX ranges to accomplish this as a post processing step given an NVTX range. We should investigate if this solution does what we need. |
Hi @abellina I am trying to profile the GPU memory usage during a query run. I used nsys to profile, but didn't find metrics like I was using Update: |
I haven't used this feature, the main question I'd have is whether it works with a pool, especially the async pools. It most definitely does not work with ARENA because that's all CPU managed, but cudaAsync I'd hope shows it. |
The profile result above is from a run with ASYNC pool. |
The maximum amount of GPU memory each task uses is a very helpful metric to know if an application is getting close to needing to spill or not.
Tracking the memory currently on the GPU, or spilled to host memory, etc is also really interesting.
The problem is how to gather this metric in an efficient way. The Retry framework could keep track of the amount of memory that is allocated on a given thread, and the amount that is also deallocated/freed by that thread. It would not take into account the memory that is then freed by other threads (like in the case of spill, or UCX shuffle). Instead we would almost want to associate each allocation with a given thread, but that can be very memory intensive on the host, especially because we are likely to see thousands of buffers active.
We should experiment to see how expensive this is in practice and if it is not too bad implement it.
The text was updated successfully, but these errors were encountered: