[FEA] Should we synchronize and then spill with the ASYNC allocator #6769
Labels
feature request
New feature or request
performance
A performance related task/issue
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
As mentioned here #6768, I am noticing the synchronizing on OOM can help us handle allocation failures that would otherwise be fatal. Additionally, with some quick prototyping locally, it seems that there may be a performance gain here.
Specifically, if we first
Cuda.deviceSynchronize
rather than spill right away, but fallback to the spill when we know we have already synchronized, we are able to save time with a quick query I tried in our performance cluster. I ran a query that spills constantly and it took 265 seconds vs 304 seconds without this change.That said the query also ran OOM on a second trial. The reason I think is that we are really able to pack the GPU, I see that the async pool is able to get closer to its maximum size (40GB in this case). So we have less fudge memory for those tasks that run above their ~1/concurrentGpuTasks chunk of memory.
The text was updated successfully, but these errors were encountered: