-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Prototype] Rapids spilling manager #10746
Conversation
@@ -0,0 +1,99 @@ | |||
# Copyright (c) 2022, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is only a demonstration and should be moved or removed before release
77c9aa4
to
eeee952
Compare
Codecov Report
@@ Coverage Diff @@
## branch-22.08 #10746 +/- ##
===============================================
Coverage ? 85.90%
===============================================
Files ? 147
Lines ? 23123
Branches ? 0
===============================================
Hits ? 19864
Misses ? 3259
Partials ? 0 Continue to review full report at Codecov.
|
Sync'd offline with Jake, but for completeness: It's not independent. We're going to have to incorporate those changes here eventually. This is not ready for review. |
95418c5
to
36e40c4
Compare
c2146eb
to
ad31306
Compare
a57f1da
to
2c0d15a
Compare
*/ | ||
column_view_base(data_type type, | ||
size_type size, | ||
void const* data, | ||
bitmask_type const* null_mask = nullptr, | ||
size_type null_count = UNKNOWN_NULL_COUNT, | ||
size_type offset = 0); | ||
size_type offset = 0, | ||
std::shared_ptr<void> owner = nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrhemstad do you think that a owner
argument could be accepted in libcudf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More likely we're going to do something like subclass column_view
and use that from the Python side (or use composition v/s inheritance). There are other alternatives, but for now, let's keep this and evaluate once we're past the POC stage.
4c2abcc
to
82e9529
Compare
df39a76
to
7c15ab6
Compare
This PR makes `Buffer.ptr` read-only and introduce `Buffer.from_buffer`: ```python @classmethod def from_buffer(cls, buffer: Buffer, size: int = None, offset: int = 0): """ Create a buffer from another buffer Parameters ---------- buffer : Buffer The base buffer, which will also be set as the owner of the memory allocation. size : int, optional Size of the memory allocation (default: `buffer.size`). offset : int, optional Start offset relative to `buffer.ptr`. """ ``` This is mainly motivated by my work on [spilling](#10746) by making it a bit easier to reason about the relationship between buffers. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Bradley Dice (https://github.com/bdice) - Ashwin Srinath (https://github.com/shwina) URL: #10872
This PR has been labeled |
This PR introduces factory functions to create `Buffer` instances, which makes it possible to change the returned buffer type based on a configuration option in a follow-up PR. Beside simplifying the code base a bit, this is motivated by the spilling work in #10746. We would like to introduce a new spillable Buffer class that requires minimal changes to the existing code and is only used when enabled explicitly. This way, we can introduce spilling in cuDF as an experimental feature with minimal risk to the existing code. @shwina and I discussed the possibility to let `Buffer.__new__` return different class type instances instead of using factory functions but we concluded that having `Buffer()` return anything other than an instance of `Buffer` is simply too surprising :) **Notice**, this is breaking because it removes unused methods such as `Buffer.copy()` and `Buffer.nbytes`. ~~However, we still support creating a buffer directly by calling `Buffer(obj)`. AFAIK, this is the only way `Buffer` is created outside of cuDF, which [a github search seems to confirm](https://github.com/search?l=&q=cudf.core.buffer+-repo%3Arapidsai%2Fcudf&type=code).~~ This PR doesn't change the signature of `Buffer.__init__()` anymore. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Ashwin Srinath (https://github.com/shwina) - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) - https://github.com/brandon-b-miller URL: #11447
Closed in favor of #11553 |
This is part of the effort to implement seamlessly spilling in cuDF and is just for testing for now.
The idea is to have a new column accessor,
SpillableColumnAccessor
, that can serialize and deserialize its columns in-place and a new manager that order column serializations triggered byrmm.mr.FailureCallbackResourceAdaptor
.As a demonstration, I have included
python/cudf/cudf/spilling-demo.py
that continues to allocate random dataframes until running out of device memory at which point spilling are triggered.Output of running `spilling-demo.py` on my workstation
cc. @shwina @quasiben