Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Should we synchronize and then spill with the ASYNC allocator #6769

Open
abellina opened this issue Oct 12, 2022 · 0 comments
Open

[FEA] Should we synchronize and then spill with the ASYNC allocator #6769

abellina opened this issue Oct 12, 2022 · 0 comments
Assignees
Labels
feature request New feature or request performance A performance related task/issue reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@abellina
Copy link
Collaborator

As mentioned here #6768, I am noticing the synchronizing on OOM can help us handle allocation failures that would otherwise be fatal. Additionally, with some quick prototyping locally, it seems that there may be a performance gain here.

Specifically, if we first Cuda.deviceSynchronize rather than spill right away, but fallback to the spill when we know we have already synchronized, we are able to save time with a quick query I tried in our performance cluster. I ran a query that spills constantly and it took 265 seconds vs 304 seconds without this change.

That said the query also ran OOM on a second trial. The reason I think is that we are really able to pack the GPU, I see that the async pool is able to get closer to its maximum size (40GB in this case). So we have less fudge memory for those tasks that run above their ~1/concurrentGpuTasks chunk of memory.

@abellina abellina added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Oct 12, 2022
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request performance A performance related task/issue reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

No branches or pull requests

2 participants