-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer: make .ptr
read-only
#10872
Buffer: make .ptr
read-only
#10872
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.08 #10872 +/- ##
===============================================
Coverage ? 86.83%
===============================================
Files ? 144
Lines ? 23721
Branches ? 0
===============================================
Hits ? 20598
Misses ? 3123
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some high-level suggestions that may be beyond the scope of this PR. The most actionable suggestion I have for this PR is to make _size
private. Beyond that, it might be good for us to consider the design of the class's constructor and factories as a whole. I don't know if we're locked into a specific API by external users or matching other packages, but we might be able to make this entire class less awkward.
|
||
def __init__( | ||
self, data: Any = None, size: Optional[int] = None, owner: Any = None | ||
self, data: Any = None, size: int = None, owner: object = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that size
is only used when data
is an integer (representing a pointer) and is otherwise ignored. It seems like a more natural API would be to accept a tuple (pointer, size)
rather than exposing a size
argument that is ignored by a majority of the code paths.
Is there a reason to keep this API for aligning with some other class(es) outside of this package?
(Not necessary to address in this PR but we may want to open an issue.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is it not required to assign "Optional" types for parameters defaulting to None
? (None
is not an int
, so I'm surprised mypy permits this without complaint.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to keep this API for aligning with some other class(es) outside of this package?
I don't think so, @vyasr do you know?
Also, is it not required to assign "Optional" types for parameters defaulting to None? (None is not an int, so I'm surprised mypy permits this without complaint.)
Yes and No. This is what we do throughout the RAPIDS code base and it is allowed by PEP-484. However, I see now that PEP-484 has been changed to recommend the use of Optional
always.
Personally, I much prefer the old way. I think adding Optional
to the vast majority of optional arguments bloats the code unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, @vyasr do you know?
Nope, it should be fine. There is a remote possibility that something in dask_cudf
or dask-cuda
is relying on the current behavior, but in either case we own those and can push the fixes downstream if we need to. That said, definitely agree with putting that fix in a subsequent PR.
Personally, I much prefer the old way. I think adding Optional to the vast majority of optional arguments bloats the code unnecessary.
I prefer having a style guide telling everybody what to do ;) I think we are currently very inconsistent in how we document optional parameters, but we may as well stick to what PEP 484 currently recommends since there's no telling when that will become a mypy error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer having a style guide telling everybody what to do ;) I think we are currently very inconsistent in how we document optional parameters, but we may as well stick to what PEP 484 currently recommends since there's no telling when that will become a mypy error.
Fair enough :) Maybe we should do a type hint overhauling at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madsbk Can you also discuss why you changed Any
to object
? The difference is subtle but significant. I see it discussed on StackOverflow but I'd be interested to hear your rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using object
, we assume that owner
has the same attributes as object
and nothing else. Thus, mypy will complain if we do something like len(owner)
. This is exactly what we want since Buffer
only uses owner
to keep a reference to the underlying device memory alive.
On the other hand when using Any
we assume that owner
has any attribute thus mypy will accept anything including len(owner)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! Thanks for that explanation, that aligns with my understanding as well.
@@ -72,23 +71,48 @@ def __init__( | |||
raise TypeError("data must be Buffer, array-like or integer") | |||
self._init_from_array_like(np.asarray(data), owner) | |||
|
|||
@classmethod | |||
def from_buffer(cls, buffer: Buffer, size: int = None, offset: int = 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider a slice API instead of a classmethod factory here. It might be more Pythonic?
Before:
buf = Buffer.from_buffer(
buffer=codes_column.base_data,
size=self.size * codes_column.dtype.itemsize,
offset=self.offset * codes_column.dtype.itemsize,
)
After:
start_bytes = self.offset * codes_column.dtype.itemsize
end_bytes = start_bytes + self.size * codes_column.dtype.itemsize
buf = codes_column.base_data[start_bytes:end_bytes]
Suggested implementation:
def __getitem__(self, key):
if not isinstance(key, slice):
raise ValueError("Buffers can be sliced but individual items cannot be accessed.")
start, stop, step = key.start, key.stop, key.step
if step != 1:
raise ValueError("A Buffer can only be sliced with step = 1.")
if start is None:
start = 0
if stop is None:
stop = len(self)
if start < 0:
start += len(self)
if stop < 0:
stop += len(self)
if stop > len(self) or start < 0:
raise ValueError("The slice start and stop must be in bounds.")
if start > stop:
raise ValueError("The slice start cannot be greater than the slice stop.")
ret = cls()
ret._ptr = self._ptr + start
ret.size = stop - start
ret._owner = self
return ret
python/cudf/cudf/core/buffer.py
Outdated
|
||
ret = cls() | ||
ret._ptr = buffer._ptr + offset | ||
ret.size = buffer.size if size is None else size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider requiring size is not None
if offset
is provided (or doing bounds checks).
Buffer.from_buffer(buf, offset=32)
will create a buffer with the same size as buf
, but its end points out of bounds for the original buffer. As a result, it's slightly easier to misuse this API than a slice-based API like buf[32:]
.
There have been lots of discussions about the overall purpose, design, and appropriate home for this class. rapidsai/rmm#227 is quite relevant. As you pointed out, most of these discussions are beyond the scope of this PR, so I would recommend that we not pursue them too much further here, but I'm happy to continue more discussions offline. I think keeping this PR narrow would be helpful to let Mads' work on spilling proceed without blockers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small change requests and one API question about size
and nbytes
being synonyms. Otherwise LGTM. This is now on my radar to make a follow-up PR that can improve these APIs.
Co-authored-by: Bradley Dice <[email protected]>
Co-authored-by: Bradley Dice <[email protected]>
Thanks @bdice. |
Sounds good @madsbk. Thanks again for your patience and explanation. |
@vyasr, I think this is ready to be merged |
@madsbk I don't foresee any issues from merging this PR. But we are technically in code freeze for 22.06, and this PR touches some pretty deep internals. Can we target 22.08 instead? I'll merge once it's retargeted. |
Sure, that is probably a good idea :)
…On Wed, 25 May 2022 at 13.53, Ashwin Srinath ***@***.***> wrote:
@madsbk <https://github.com/madsbk> I don't foresee any issues from
merging this PR. But we are technically in code freeze for 22.06, and this
PR touches some pretty deep internals. Can we target 22.08 instead? I'll
merge once it's retargeted.
—
Reply to this email directly, view it on GitHub
<#10872 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAH6FQHSIOJFARGWX2IWUJTVLYICJANCNFSM5WEAOXNA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving & merging.
@gpucibot merge |
Thanks everyone! |
This PR makes
Buffer.ptr
read-only and introduceBuffer.from_buffer
:This is mainly motivated by my work on spilling by making it a bit easier to reason about the relationship between buffers.