[FEA] Set RMM async allocator as default #4515

rongou · 2022-01-12T21:42:19Z

Is your feature request related to a problem? Please describe.
Currently the RMM arena allocator is the default, but in some circumstances it can still run into OOM errors due to memory fragmentation. The async allocator, relying on cudaMallocAsync and cudaFreeAsync, can remap physical pages when running of GPU memory, thus is more resistant to memory fragmentation.

Describe the solution you'd like
Set the async allocator as the default.

Describe alternatives you've considered
Continue improving the arena allocator is possible, but since it doesn't remap memory, probably won't ever be as fragmentation resistance as the async allocator.

Additional context
Set this at the beginning of 22.04 for more testing.

The text was updated successfully, but these errors were encountered:

rongou · 2022-01-14T18:05:29Z

Since cudaMallocAsync requires CUDA 11.2, we'll do a version check and only switch to the async allocator if the cuda driver is above 11.2.

rongou added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 12, 2022

rongou self-assigned this Jan 12, 2022

This was referenced Jan 12, 2022

[BUG] OOM when running NDS queries with UCX and GDS #2504

Closed

[BUG]OOM happened when do cube operations #3226

Closed

sameerz removed the ? - Needs Triage Need team to review and classify label Jan 18, 2022

rongou mentioned this issue Jan 21, 2022

Set default RMM pool to ASYNC for cuda 11.2+ #4606

Merged

rongou closed this as completed in #4606 Jan 27, 2022

abellina mentioned this issue Jan 28, 2022

[BUG] q93 failed in this week's NDS runs #4045

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Set RMM async allocator as default #4515

[FEA] Set RMM async allocator as default #4515

rongou commented Jan 12, 2022

rongou commented Jan 14, 2022 •

edited

Loading

[FEA] Set RMM async allocator as default #4515

[FEA] Set RMM async allocator as default #4515

Comments

rongou commented Jan 12, 2022

rongou commented Jan 14, 2022 • edited Loading

rongou commented Jan 14, 2022 •

edited

Loading