-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Superseded] Spilling to host memory seamlessly #11553
Conversation
Codecov ReportBase: 88.11% // Head: 88.25% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-22.12 #11553 +/- ##
================================================
+ Coverage 88.11% 88.25% +0.14%
================================================
Files 133 137 +4
Lines 21982 22487 +505
================================================
+ Hits 19369 19846 +477
- Misses 2613 2641 +28
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
This is a hack we have to remove before merging this PR.
Locking already locked buffers when handling spill-on-demand may result in deadlock.
This reverts commit 9da6b5b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to get close to a place where we can move it out of draft and open it up for review. Aside from the inline comments, here are the major remaining roadblocks that I see:
- The question about how to make Dask and friends
get_ptr
-aware is an important one. I am not sure we can reasonably expect that from other packages, although maybe I'm wrong. I think that needs to be discussed further. - A lot of the classes/methods/functions need docstrings. Please add them. The code is quite large and difficult to review, especially when there are many pieces that aren't strictly necessary, are used only for debugging, or are "niche" features associated with specific edge cases or debugging.
- Improving docstrings will help, but even so the scale of this PR has me worried about review. I wonder if it might be worth making the DeviceBufferLike changes first to reduce the diff of this PR, and then maybe to see if there are other components of this PR that can be broken up and reviewed/merged in pieces.
@shwina what do you think about the above?
without having to handle non-slice inputs. | ||
""" | ||
return self.__class__( | ||
wrap_device_pointer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change necessary? I understand adding _getitem
so that SpillableBuffer
can override, but did the behavior of this implementation need to change? The old impl just directly called the constructor using data, size, owner
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't need to call it here, we can just inline wrap_device_pointer
in the logic for ensure_buffer_like
since that's the only other place that it's used.
self._last_accessed = time.monotonic() | ||
|
||
# First, we extract the memory pointer, size, and owner. | ||
# If it points to host memory we either: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I interpreting correctly that creating a SpillableDeviceBuffer from host memory and specifying exposed=True
does not mean that the data is already exposed, but rather that the intended use cases will expose it so we are just telling the buffer to treat itself as exposed from the start? Calling host memory "exposed" isn't meaningful in this context unless I'm completely misunderstanding the intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, exposed=True
for host memory doesn't make sense. I have removed this branch: 64ac420
for o in obj.values(): | ||
_get_columns(o) | ||
|
||
_get_columns(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with the rmm resource stack function, I would probably recommend using a generator to avoid needing to explicitly recurse yourself. How complex the generator needs to be will mostly depend on the answer to my above question about the branches of _get_columns
.
continue # TODO: support masks | ||
|
||
if col.base_data is None: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we also don't support slices yet at this stage, is that correct? Or is it just unnecessary because those are just views and we assume we only need to work with the originals, not the views?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow, does slices of columns not have a .base_data
?
Co-authored-by: Vyas Ramasubramani <[email protected]>
This PR replaces `DeviceBufferLike` with `Buffer` and clear the way for a spillable sub-class of `Buffer`. #### Context The introduction of the [`DeviceBufferLike`](#11447) protocol was motivated by [the spilling work](#11553), which we initially thought would have to be implemented in Cython. However, it can be done in pure Python, which makes `DeviceBufferLike` an unneeded complexity. #### Review notes - In order to introduce a spillable-buffer in the future, we still use a factory function, `as_buffer()`, to create Buffers. - `buffer.py` is moved into the submodule `core.buffer` to ease organization when adding the spillable-buffer and spilling manager. #### Breaking This PR breaks external use of `Buffer` e.g. `Buffer.__init__` raise an exception now and the `"constructor-kwargs"` header from #4164 has been removed. Submitted a PR to fix this in cuml: rapidsai/cuml#4965 ## Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Ashwin Srinath (https://github.com/shwina) URL: #12009
Should we close this if it is superseded by #12106 ? |
I don't mind either way. This branch is definitely still relevant. There is code for logging spilling statistics in this PR that didn't make it into #12106. As long as we keep the branch around I'm indifferent to what we do with this PR in the short term and would defer to whatever @madsbk finds most convenient. |
FWIW it is possible to checkout a PR locally even after it is closed and the branch that sent it is deleted. |
This PR implementing spilling of device to host memory, which is based on #11553. Spilling can be enabled in two ways (it is disabled by default): - setting the environment variable `CUDF_SPILL=on`, or - setting the `spill` option in `cudf` by doing `cudf.set_option("spill", True)`. Additionally, parameters are: - `CUDF_SPILL_ON_DEMAND=ON` / `cudf.set_option("spill_on_demand", True)`, which registers an RMM out-of-memory error handler that spills buffers in order to free up memory. - `CUDF_SPILL_DEVICE_LIMIT=...` / `cudf.set_option("spill_device_limit", ...)`, which sets a device memory limit in bytes. I have limited the scope of this PR. In a follow-up PR, I will port the statistics, logging, and partial unspill from #11553. ### Design Spilling consists of two components: - A new buffer sub-class, `SpillableBuffer`, that implements moving of its data from host to device memory in-place. - A spill manager that tracks all instances of `SpillableBuffer` and spills them on demand. A global spill manager is used throughout cudf when spilling is enabled, which makes `as_buffer()` return `SpillableBuffer` instead of the default `Buffer` instances. #### Challenges Accessing `Buffer.ptr`, we get the device memory pointer of the buffer. This is unproblematic in the case of `Buffer` but what happens when accessing `SpillableBuffer.ptr`, which might have spilled its device memory? In this case, `SpillableBuffer` needs to unspill the memory before returning its device memory pointer. Furthermore, while this device memory pointer is being used (or could be used), `SpillableBuffer` cannot spill its memory back to host memory because doing so would invalidate the device pointer. To address this, we mark the `SpillableBuffer` as unspillable, we say that the buffer has been _exposed_. This can be either permanent if the device pointer is exposed to external projects or temporary while `libcudf` accesses the device memory. The `SpillableBuffer.get_ptr()` returns the device pointer of the buffer memory just like `.ptr` but if given an instance of `SpillLock`, the buffer is only unspillable as long as the instance of `SpillLock` is alive. For convenience, one can use the decorator/context `with_spill_lock` to associate a `SpillLock` with a lifetime bound to the context automatically. ### Overhead When spilling is disabled, the overhead of this PR comes from the decorator `with_spill_lock`. However, this is small https://gist.github.com/madsbk/da6520e7583cf5d728a1b5a1b09200f3: ``` Micro benchmark on my local workstation: spilling off: raw: 0.06371338899771217 us with-spill-lock: 1.0796624180002254 us spilling on: raw: 0.05873749500096892 us with-spill-lock: 1.2184517139976379 us ``` ## Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Vyas Ramasubramani (https://github.com/vyasr) - AJ Schmidt (https://github.com/ajschmidt8) URL: #12106
Superseded by #12106
We introduce a new
Buffer
class,SpillableBuffer
, that will spill its device memory to host memory if RMM is running out of unused memory.In order to enable this new spilling feature, set the environment variable
CUDF_SPILL=on
, which make cuDF useSpillableBuffer
buffers for most of its allocations.Overhead
When spilling is disabled, the overhead of this PR comes from the decorator
with_spill_lock
. However, this is small https://gist.github.com/madsbk/da6520e7583cf5d728a1b5a1b09200f3:Checklist
libcudf