Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spill OOM Protection #16737

Closed

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Sep 4, 2024

Depend on rapidsai/rmm#1665

Introduce the spill_oom_protection option that uses managed memory when spilling-on-demand would otherwise crash with an OOM error.

This targets our CUDF_SPILL users, which have workflows that cudf-spilling can handle generally but might encounter memory hotspots that sometime trigger an OOM crash. With CUDF_SPILL_OOM_PROTECTION, these hotspots will now use managed memory. If there is no such hotspots that CUDF_SPILL cannot handle, this option does nothing.

The target is not heavy oversubscribing workflows, in such cases using manager memory with prefetching is preferable.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • Update docstrings
  • The documentation is up to date with these changes.

@madsbk madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 4, 2024
@github-actions github-actions bot added the Python Affects Python cuDF API. label Sep 4, 2024
@madsbk madsbk closed this Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant