Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Superseded] Spilling to host memory seamlessly #11553

Closed
wants to merge 124 commits into from

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Aug 17, 2022

Superseded by #12106

We introduce a new Buffer class, SpillableBuffer, that will spill its device memory to host memory if RMM is running out of unused memory.

In order to enable this new spilling feature, set the environment variable CUDF_SPILL=on, which make cuDF use SpillableBuffer buffers for most of its allocations.

Overhead

When spilling is disabled, the overhead of this PR comes from the decorator with_spill_lock. However, this is small https://gist.github.com/madsbk/da6520e7583cf5d728a1b5a1b09200f3:

I did a micro benchmark on my local workstation. 
The overhead is ~0.2us when spilling is disabled and ~0.7us when enabled. 
When spilling is enabled and `read_only_columns=True` the overhead is 127us. 

Checklist

  • New or existing tests cover these changes.
  • Avoid changes to libcudf
  • The documentation is up to date with these changes.

@madsbk madsbk added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 17, 2022
@github-actions github-actions bot added CMake CMake build issue Python Affects Python cuDF API. gpuCI libcudf Affects libcudf (C++/CUDA) code. labels Aug 17, 2022
@codecov
Copy link

codecov bot commented Aug 18, 2022

Codecov Report

Base: 88.11% // Head: 88.25% // Increases project coverage by +0.14% 🎉

Coverage data is based on head (a3ad5a6) compared to base (5c2150e).
Patch coverage: 88.55% of modified lines in pull request are covered.

❗ Current head a3ad5a6 differs from pull request most recent head e31cc6a. Consider uploading reports for the commit e31cc6a to get more accurate results

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #11553      +/-   ##
================================================
+ Coverage         88.11%   88.25%   +0.14%     
================================================
  Files               133      137       +4     
  Lines             21982    22487     +505     
================================================
+ Hits              19369    19846     +477     
- Misses             2613     2641      +28     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/decimal.py 90.60% <ø> (ø)
python/cudf/cudf/core/column/numerical.py 95.49% <ø> (ø)
python/cudf/cudf/core/buffer/spill_manager.py 76.59% <76.59%> (ø)
python/cudf/cudf/core/column/column.py 88.46% <80.00%> (ø)
python/cudf/cudf/core/buffer/spillable_buffer.py 93.45% <93.45%> (ø)
python/cudf/cudf/core/dtypes.py 96.64% <93.75%> (+0.11%) ⬆️
python/cudf/cudf/core/buffer/utils.py 96.87% <96.87%> (ø)
python/cudf/cudf/core/buffer/buffer.py 91.40% <96.96%> (ø)
python/cudf/cudf/core/abc.py 94.44% <100.00%> (+8.08%) ⬆️
python/cudf/cudf/core/buffer/__init__.py 100.00% <100.00%> (ø)
... and 11 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@github-actions github-actions bot removed the libcudf Affects libcudf (C++/CUDA) code. label Aug 22, 2022
@jakirkham jakirkham requested a review from vyasr October 24, 2022 21:28
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to get close to a place where we can move it out of draft and open it up for review. Aside from the inline comments, here are the major remaining roadblocks that I see:

  • The question about how to make Dask and friends get_ptr-aware is an important one. I am not sure we can reasonably expect that from other packages, although maybe I'm wrong. I think that needs to be discussed further.
  • A lot of the classes/methods/functions need docstrings. Please add them. The code is quite large and difficult to review, especially when there are many pieces that aren't strictly necessary, are used only for debugging, or are "niche" features associated with specific edge cases or debugging.
  • Improving docstrings will help, but even so the scale of this PR has me worried about review. I wonder if it might be worth making the DeviceBufferLike changes first to reduce the diff of this PR, and then maybe to see if there are other components of this PR that can be broken up and reviewed/merged in pieces.

@shwina what do you think about the above?

docs/cudf/source/developer_guide/spill_manager.md Outdated Show resolved Hide resolved
without having to handle non-slice inputs.
"""
return self.__class__(
wrap_device_pointer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change necessary? I understand adding _getitem so that SpillableBuffer can override, but did the behavior of this implementation need to change? The old impl just directly called the constructor using data, size, owner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need to call it here, we can just inline wrap_device_pointer in the logic for ensure_buffer_like since that's the only other place that it's used.

python/cudf/cudf/core/buffer/spillable_buffer.py Outdated Show resolved Hide resolved
self._last_accessed = time.monotonic()

# First, we extract the memory pointer, size, and owner.
# If it points to host memory we either:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I interpreting correctly that creating a SpillableDeviceBuffer from host memory and specifying exposed=True does not mean that the data is already exposed, but rather that the intended use cases will expose it so we are just telling the buffer to treat itself as exposed from the start? Calling host memory "exposed" isn't meaningful in this context unless I'm completely misunderstanding the intent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, exposed=True for host memory doesn't make sense. I have removed this branch: 64ac420

python/cudf/cudf/core/buffer/spillable_buffer.py Outdated Show resolved Hide resolved
for o in obj.values():
_get_columns(o)

_get_columns(obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the rmm resource stack function, I would probably recommend using a generator to avoid needing to explicitly recurse yourself. How complex the generator needs to be will mostly depend on the answer to my above question about the branches of _get_columns.

python/cudf/cudf/core/buffer/utils.py Outdated Show resolved Hide resolved
continue # TODO: support masks

if col.base_data is None:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we also don't support slices yet at this stage, is that correct? Or is it just unnecessary because those are just views and we assume we only need to work with the originals, not the views?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow, does slices of columns not have a .base_data?

python/cudf/cudf/core/buffer/utils.py Show resolved Hide resolved
python/cudf/cudf/core/buffer/utils.py Outdated Show resolved Hide resolved
@madsbk madsbk mentioned this pull request Oct 27, 2022
3 tasks
rapids-bot bot pushed a commit that referenced this pull request Nov 2, 2022
This PR replaces `DeviceBufferLike` with `Buffer` and clear the way for a spillable sub-class of `Buffer`.

#### Context 
The introduction of the [`DeviceBufferLike`](#11447) protocol was motivated by [the spilling work](#11553), which we initially thought would have to be implemented in Cython. However, it can be done in pure Python, which makes `DeviceBufferLike` an unneeded complexity.  

#### Review notes 

- In order to introduce a spillable-buffer in the future, we still use a factory function, `as_buffer()`, to create Buffers. 
- `buffer.py` is moved into the submodule `core.buffer` to ease organization when adding the spillable-buffer and spilling manager. 

#### Breaking
This PR breaks external use of `Buffer` e.g. `Buffer.__init__` raise an exception now and the `"constructor-kwargs"` header from #4164 has been removed. 
Submitted a PR to fix this in cuml: rapidsai/cuml#4965

##

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Ashwin Srinath (https://github.com/shwina)

URL: #12009
madsbk added a commit to madsbk/cudf that referenced this pull request Nov 9, 2022
@madsbk madsbk mentioned this pull request Nov 9, 2022
3 tasks
@madsbk madsbk changed the title Spilling to host memory seamlessly [Superseded] Spilling to host memory seamlessly Nov 11, 2022
@quasiben
Copy link
Member

quasiben commented Nov 16, 2022

Should we close this if it is superseded by #12106 ?

@vyasr
Copy link
Contributor

vyasr commented Nov 16, 2022

Should we close this if it is superseded by #12106 ?

I don't mind either way. This branch is definitely still relevant. There is code for logging spilling statistics in this PR that didn't make it into #12106. As long as we keep the branch around I'm indifferent to what we do with this PR in the short term and would defer to whatever @madsbk finds most convenient.

@jakirkham
Copy link
Member

FWIW it is possible to checkout a PR locally even after it is closed and the branch that sent it is deleted.

rapids-bot bot pushed a commit that referenced this pull request Nov 16, 2022
This PR implementing spilling of device to host memory, which is based on #11553. 

Spilling can be enabled in two ways (it is disabled by default): 
  - setting the environment variable `CUDF_SPILL=on`, or 
  - setting the `spill` option in `cudf` by doing `cudf.set_option("spill", True)`. 

Additionally, parameters are: 
  - `CUDF_SPILL_ON_DEMAND=ON` / `cudf.set_option("spill_on_demand", True)`, which registers an RMM out-of-memory error handler that spills buffers in order to free up memory. 
 - `CUDF_SPILL_DEVICE_LIMIT=...` / `cudf.set_option("spill_device_limit", ...)`, which sets a device memory limit in bytes. 

I have limited the scope of this PR. In a follow-up PR, I will port the statistics, logging, and partial unspill from #11553. 

### Design 

Spilling consists of two components: 

- A new buffer sub-class, `SpillableBuffer`, that implements moving of its data from host to device memory in-place. 

- A spill manager that tracks all instances of `SpillableBuffer` and spills them on demand.  

A global spill manager is used throughout cudf when spilling is enabled, which makes `as_buffer()` return `SpillableBuffer` instead of the default `Buffer` instances.  


#### Challenges 

Accessing `Buffer.ptr`, we get the device memory pointer of the buffer. This is unproblematic in the case of `Buffer` but what happens when accessing `SpillableBuffer.ptr`, which might have spilled its device memory? In this case, `SpillableBuffer` needs to unspill the memory before returning its device memory pointer. Furthermore, while this device memory pointer is being used (or could be used), `SpillableBuffer`  cannot spill its memory back to host memory because doing so would invalidate the device pointer. 

To address this, we mark the `SpillableBuffer` as unspillable, we say that the buffer has been _exposed_. This can be either permanent if the device pointer is exposed to external projects or temporary while `libcudf` accesses the device memory.  

The `SpillableBuffer.get_ptr()` returns the device pointer of the buffer memory just like `.ptr` but if given an instance of `SpillLock`, the buffer is only unspillable as long as the instance of `SpillLock` is alive.  

For convenience, one can use the decorator/context `with_spill_lock` to associate a `SpillLock` with a lifetime bound to the context automatically. 


### Overhead 

When spilling is disabled, the overhead of this PR comes from the decorator `with_spill_lock`. However, this is small https://gist.github.com/madsbk/da6520e7583cf5d728a1b5a1b09200f3:
```
Micro benchmark on my local workstation:
  spilling off:
    raw:                    0.06371338899771217 us
    with-spill-lock:        1.0796624180002254 us
  spilling on:
    raw:                    0.05873749500096892 us
    with-spill-lock:        1.2184517139976379 us
```

##

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #12106
@madsbk madsbk closed this Nov 17, 2022
@madsbk madsbk deleted the cudf_spilling branch August 3, 2023 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants