[Superseded] Spilling to host memory seamlessly #11553

madsbk · 2022-08-17T11:40:39Z

Superseded by #12106

We introduce a new Buffer class, SpillableBuffer, that will spill its device memory to host memory if RMM is running out of unused memory.

In order to enable this new spilling feature, set the environment variable CUDF_SPILL=on, which make cuDF use SpillableBuffer buffers for most of its allocations.

Overhead

When spilling is disabled, the overhead of this PR comes from the decorator with_spill_lock. However, this is small https://gist.github.com/madsbk/da6520e7583cf5d728a1b5a1b09200f3:

I did a micro benchmark on my local workstation. 
The overhead is ~0.2us when spilling is disabled and ~0.7us when enabled. 
When spilling is enabled and `read_only_columns=True` the overhead is 127us.

Checklist

New or existing tests cover these changes.
Avoid changes to libcudf
The documentation is up to date with these changes.

codecov · 2022-08-18T09:08:25Z

Codecov Report

Base: 88.11% // Head: 88.25% // Increases project coverage by +0.14% 🎉

Coverage data is based on head (a3ad5a6) compared to base (5c2150e).
Patch coverage: 88.55% of modified lines in pull request are covered.

❗ Current head a3ad5a6 differs from pull request most recent head e31cc6a. Consider uploading reports for the commit e31cc6a to get more accurate results

Additional details and impacted files

@@               Coverage Diff                @@
##           branch-22.12   #11553      +/-   ##
================================================
+ Coverage         88.11%   88.25%   +0.14%     
================================================
  Files               133      137       +4     
  Lines             21982    22487     +505     
================================================
+ Hits              19369    19846     +477     
- Misses             2613     2641      +28

Impacted Files	Coverage Δ
python/cudf/cudf/core/column/decimal.py	`90.60% <ø> (ø)`
python/cudf/cudf/core/column/numerical.py	`95.49% <ø> (ø)`
python/cudf/cudf/core/buffer/spill_manager.py	`76.59% <76.59%> (ø)`
python/cudf/cudf/core/column/column.py	`88.46% <80.00%> (ø)`
python/cudf/cudf/core/buffer/spillable_buffer.py	`93.45% <93.45%> (ø)`
python/cudf/cudf/core/dtypes.py	`96.64% <93.75%> (+0.11%)`	⬆️
python/cudf/cudf/core/buffer/utils.py	`96.87% <96.87%> (ø)`
python/cudf/cudf/core/buffer/buffer.py	`91.40% <96.96%> (ø)`
python/cudf/cudf/core/abc.py	`94.44% <100.00%> (+8.08%)`	⬆️
python/cudf/cudf/core/buffer/__init__.py	`100.00% <100.00%> (ø)`
... and 11 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

This is a hack we have to remove before merging this PR.

Locking already locked buffers when handling spill-on-demand may result in deadlock.

This reverts commit 9da6b5b.

vyasr

This is starting to get close to a place where we can move it out of draft and open it up for review. Aside from the inline comments, here are the major remaining roadblocks that I see:

The question about how to make Dask and friends get_ptr-aware is an important one. I am not sure we can reasonably expect that from other packages, although maybe I'm wrong. I think that needs to be discussed further.
A lot of the classes/methods/functions need docstrings. Please add them. The code is quite large and difficult to review, especially when there are many pieces that aren't strictly necessary, are used only for debugging, or are "niche" features associated with specific edge cases or debugging.
Improving docstrings will help, but even so the scale of this PR has me worried about review. I wonder if it might be worth making the DeviceBufferLike changes first to reduce the diff of this PR, and then maybe to see if there are other components of this PR that can be broken up and reviewed/merged in pieces.

@shwina what do you think about the above?

docs/cudf/source/developer_guide/spill_manager.md

vyasr · 2022-10-24T20:47:28Z

python/cudf/cudf/core/buffer/buffer.py

+        without having to handle non-slice inputs.
+        """
+        return self.__class__(
+            wrap_device_pointer(


Why was this change necessary? I understand adding _getitem so that SpillableBuffer can override, but did the behavior of this implementation need to change? The old impl just directly called the constructor using data, size, owner.

If we don't need to call it here, we can just inline wrap_device_pointer in the logic for ensure_buffer_like since that's the only other place that it's used.

python/cudf/cudf/core/buffer/spillable_buffer.py

vyasr · 2022-10-24T21:24:21Z

python/cudf/cudf/core/buffer/spillable_buffer.py

+        self._last_accessed = time.monotonic()
+
+        # First, we extract the memory pointer, size, and owner.
+        # If it points to host memory we either:


Am I interpreting correctly that creating a SpillableDeviceBuffer from host memory and specifying exposed=True does not mean that the data is already exposed, but rather that the intended use cases will expose it so we are just telling the buffer to treat itself as exposed from the start? Calling host memory "exposed" isn't meaningful in this context unless I'm completely misunderstanding the intent.

You are correct, exposed=True for host memory doesn't make sense. I have removed this branch: 64ac420

python/cudf/cudf/core/buffer/spillable_buffer.py

vyasr · 2022-10-24T23:25:07Z

python/cudf/cudf/core/buffer/utils.py

+            for o in obj.values():
+                _get_columns(o)
+
+    _get_columns(obj)


As with the rmm resource stack function, I would probably recommend using a generator to avoid needing to explicitly recurse yourself. How complex the generator needs to be will mostly depend on the answer to my above question about the branches of _get_columns.

python/cudf/cudf/core/buffer/utils.py

vyasr · 2022-10-24T23:26:53Z

python/cudf/cudf/core/buffer/utils.py

+            continue  # TODO: support masks
+
+        if col.base_data is None:
+            continue


So we also don't support slices yet at this stage, is that correct? Or is it just unnecessary because those are just views and we assume we only need to work with the originals, not the views?

I am not sure I follow, does slices of columns not have a .base_data?

python/cudf/cudf/core/buffer/utils.py

Co-authored-by: Vyas Ramasubramani <[email protected]>

…ling

This PR replaces `DeviceBufferLike` with `Buffer` and clear the way for a spillable sub-class of `Buffer`. #### Context The introduction of the [`DeviceBufferLike`](#11447) protocol was motivated by [the spilling work](#11553), which we initially thought would have to be implemented in Cython. However, it can be done in pure Python, which makes `DeviceBufferLike` an unneeded complexity. #### Review notes - In order to introduce a spillable-buffer in the future, we still use a factory function, `as_buffer()`, to create Buffers. - `buffer.py` is moved into the submodule `core.buffer` to ease organization when adding the spillable-buffer and spilling manager. #### Breaking This PR breaks external use of `Buffer` e.g. `Buffer.__init__` raise an exception now and the `"constructor-kwargs"` header from #4164 has been removed. Submitted a PR to fix this in cuml: rapidsai/cuml#4965 ## Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Ashwin Srinath (https://github.com/shwina) URL: #12009

quasiben · 2022-11-16T18:23:22Z

Should we close this if it is superseded by #12106 ?

vyasr · 2022-11-16T19:04:18Z

Should we close this if it is superseded by #12106 ?

I don't mind either way. This branch is definitely still relevant. There is code for logging spilling statistics in this PR that didn't make it into #12106. As long as we keep the branch around I'm indifferent to what we do with this PR in the short term and would defer to whatever @madsbk finds most convenient.

jakirkham · 2022-11-16T21:32:45Z

FWIW it is possible to checkout a PR locally even after it is closed and the branch that sent it is deleted.

This PR implementing spilling of device to host memory, which is based on #11553. Spilling can be enabled in two ways (it is disabled by default): - setting the environment variable `CUDF_SPILL=on`, or - setting the `spill` option in `cudf` by doing `cudf.set_option("spill", True)`. Additionally, parameters are: - `CUDF_SPILL_ON_DEMAND=ON` / `cudf.set_option("spill_on_demand", True)`, which registers an RMM out-of-memory error handler that spills buffers in order to free up memory. - `CUDF_SPILL_DEVICE_LIMIT=...` / `cudf.set_option("spill_device_limit", ...)`, which sets a device memory limit in bytes. I have limited the scope of this PR. In a follow-up PR, I will port the statistics, logging, and partial unspill from #11553. ### Design Spilling consists of two components: - A new buffer sub-class, `SpillableBuffer`, that implements moving of its data from host to device memory in-place. - A spill manager that tracks all instances of `SpillableBuffer` and spills them on demand. A global spill manager is used throughout cudf when spilling is enabled, which makes `as_buffer()` return `SpillableBuffer` instead of the default `Buffer` instances. #### Challenges Accessing `Buffer.ptr`, we get the device memory pointer of the buffer. This is unproblematic in the case of `Buffer` but what happens when accessing `SpillableBuffer.ptr`, which might have spilled its device memory? In this case, `SpillableBuffer` needs to unspill the memory before returning its device memory pointer. Furthermore, while this device memory pointer is being used (or could be used), `SpillableBuffer` cannot spill its memory back to host memory because doing so would invalidate the device pointer. To address this, we mark the `SpillableBuffer` as unspillable, we say that the buffer has been _exposed_. This can be either permanent if the device pointer is exposed to external projects or temporary while `libcudf` accesses the device memory. The `SpillableBuffer.get_ptr()` returns the device pointer of the buffer memory just like `.ptr` but if given an instance of `SpillLock`, the buffer is only unspillable as long as the instance of `SpillLock` is alive. For convenience, one can use the decorator/context `with_spill_lock` to associate a `SpillLock` with a lifetime bound to the context automatically. ### Overhead When spilling is disabled, the overhead of this PR comes from the decorator `with_spill_lock`. However, this is small https://gist.github.com/madsbk/da6520e7583cf5d728a1b5a1b09200f3: ``` Micro benchmark on my local workstation: spilling off: raw: 0.06371338899771217 us with-spill-lock: 1.0796624180002254 us spilling on: raw: 0.05873749500096892 us with-spill-lock: 1.2184517139976379 us ``` ## Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Vyas Ramasubramani (https://github.com/vyasr) - AJ Schmidt (https://github.com/ajschmidt8) URL: #12106

madsbk added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 17, 2022

github-actions bot added CMake CMake build issue Python Affects Python cuDF API. gpuCI libcudf Affects libcudf (C++/CUDA) code. labels Aug 17, 2022

madsbk force-pushed the cudf_spilling branch from e263870 to 31783d4 Compare August 18, 2022 06:51

madsbk added 12 commits August 22, 2022 12:46

column_view: add owner

9da6b5b

This is a hack we have to remove before merging this PR.

Port of rapidsai#10746

89a9c30

TODO: DelayedPointerTuple not supported by PyTorch

90bd9f1

tracking other buffers

f774fc8

Impl. ptr_restricted() and ExposeToken

8902df5

doc

7349fe1

build.sh: fixing test paths

6ae8602

SpillableBuffer: add exposed buffers to the manager

04a69ce

spill_device_memory(): fix possible deadlock

a1a7017

Locking already locked buffers when handling spill-on-demand may result in deadlock.

Intorduce SpillLock

12753f9

Revert "column_view: add owner"

f28d147

This reverts commit 9da6b5b.

ptr_raw(): fix offset when spill_lock is None

f426365

madsbk force-pushed the cudf_spilling branch from 2e340df to f426365 Compare August 22, 2022 12:05

github-actions bot removed the libcudf Affects libcudf (C++/CUDA) code. label Aug 22, 2022

madsbk added 7 commits August 23, 2022 11:14

Impl. CUDF_SPILL_STAT_EXPOSE

dc9f7a5

Print expose statistics when OOM

16eaadb

table_view_from_table(): added the spill_lock argument

f5c75ea

Grabbing spill locks

477f953

test_expose_statistics(): fix ordering of the stats

d3364e1

ptr_restricted(): fixed type hint

8f0dc1d

Buffer: added a readonly argument

e966ec6

madsbk added 2 commits October 24, 2022 10:41

revert Copyright update

c9ae32d

with_spill_lock: quick return, if spilling is disabled.

0eab0d3

jakirkham requested a review from vyasr October 24, 2022 21:28

vyasr requested changes Oct 24, 2022

View reviewed changes

madsbk and others added 10 commits October 25, 2022 16:06

Typo

a172627

Co-authored-by: Vyas Ramasubramani <[email protected]>

Merge branch 'cudf_spilling' of github.com:madsbk/cudf into cudf_spil…

34db7bc

…ling

library-design: added "Spilling to host memory"

5882d4f

added test_get_spill_lock_no_manager()

769f37f

clean up

9b6b87f

host memory cannot be exposed

64ac420

spill_to_device_limit(): doc

fe3095f

renamed to [get|set]_global_manager()

508309c

typing

3a66e1d

clean up

e31cc6a

madsbk mentioned this pull request Oct 27, 2022

Rollback of DeviceBufferLike #12009

Merged

3 tasks

madsbk added a commit to madsbk/cudf that referenced this pull request Nov 9, 2022

Porting spillabe buffer and manager from rapidsai#11553

f616499

madsbk mentioned this pull request Nov 9, 2022

Spilling to host memory #12106

Merged

3 tasks

madsbk changed the title ~~Spilling to host memory seamlessly~~ [Superseded] Spilling to host memory seamlessly Nov 11, 2022

madsbk closed this Nov 17, 2022

madsbk deleted the cudf_spilling branch August 3, 2023 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Superseded] Spilling to host memory seamlessly #11553

[Superseded] Spilling to host memory seamlessly #11553

madsbk commented Aug 17, 2022 •

edited

Loading

codecov bot commented Aug 18, 2022 •

edited

Loading

vyasr left a comment

vyasr Oct 24, 2022

vyasr Oct 24, 2022

vyasr Oct 24, 2022

madsbk Oct 26, 2022

vyasr Oct 24, 2022

vyasr Oct 24, 2022

madsbk Oct 26, 2022

quasiben commented Nov 16, 2022 •

edited

Loading

vyasr commented Nov 16, 2022

jakirkham commented Nov 16, 2022

[Superseded] Spilling to host memory seamlessly #11553

[Superseded] Spilling to host memory seamlessly #11553

Conversation

madsbk commented Aug 17, 2022 • edited Loading

Overhead

Checklist

codecov bot commented Aug 18, 2022 • edited Loading

Codecov Report

vyasr left a comment

Choose a reason for hiding this comment

vyasr Oct 24, 2022

Choose a reason for hiding this comment

vyasr Oct 24, 2022

Choose a reason for hiding this comment

vyasr Oct 24, 2022

Choose a reason for hiding this comment

madsbk Oct 26, 2022

Choose a reason for hiding this comment

vyasr Oct 24, 2022

Choose a reason for hiding this comment

vyasr Oct 24, 2022

Choose a reason for hiding this comment

madsbk Oct 26, 2022

Choose a reason for hiding this comment

quasiben commented Nov 16, 2022 • edited Loading

vyasr commented Nov 16, 2022

jakirkham commented Nov 16, 2022

madsbk commented Aug 17, 2022 •

edited

Loading

codecov bot commented Aug 18, 2022 •

edited

Loading

quasiben commented Nov 16, 2022 •

edited

Loading