Buffer: make `.ptr` read-only #10872

madsbk · 2022-05-17T08:07:06Z

This PR makes Buffer.ptr read-only and introduce Buffer.from_buffer:

@classmethod
def from_buffer(cls, buffer: Buffer, size: int = None, offset: int = 0):
    """
    Create a buffer from another buffer

    Parameters
    ----------
    buffer : Buffer
        The base buffer, which will also be set as the owner of
        the memory allocation.
    size : int, optional
        Size of the memory allocation (default: `buffer.size`).
    offset : int, optional
        Start offset relative to `buffer.ptr`.
    """

This is mainly motivated by my work on spilling by making it a bit easier to reason about the relationship between buffers.

codecov · 2022-05-17T10:09:35Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@6acf226). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.08   #10872   +/-   ##
===============================================
  Coverage                ?   86.83%           
===============================================
  Files                   ?      144           
  Lines                   ?    23721           
  Branches                ?        0           
===============================================
  Hits                    ?    20598           
  Misses                  ?     3123           
  Partials                ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6acf226...ec69d75. Read the comment docs.

python/cudf/cudf/_lib/column.pyx

bdice

I have some high-level suggestions that may be beyond the scope of this PR. The most actionable suggestion I have for this PR is to make _size private. Beyond that, it might be good for us to consider the design of the class's constructor and factories as a whole. I don't know if we're locked into a specific API by external users or matching other packages, but we might be able to make this entire class less awkward.

python/cudf/cudf/core/buffer.py

bdice · 2022-05-17T17:24:38Z

python/cudf/cudf/core/buffer.py


    def __init__(
-        self, data: Any = None, size: Optional[int] = None, owner: Any = None
+        self, data: Any = None, size: int = None, owner: object = None


It seems that size is only used when data is an integer (representing a pointer) and is otherwise ignored. It seems like a more natural API would be to accept a tuple (pointer, size) rather than exposing a size argument that is ignored by a majority of the code paths.

Is there a reason to keep this API for aligning with some other class(es) outside of this package?

(Not necessary to address in this PR but we may want to open an issue.)

Also, is it not required to assign "Optional" types for parameters defaulting to None? (None is not an int, so I'm surprised mypy permits this without complaint.)

Is there a reason to keep this API for aligning with some other class(es) outside of this package?

I don't think so, @vyasr do you know?

Also, is it not required to assign "Optional" types for parameters defaulting to None? (None is not an int, so I'm surprised mypy permits this without complaint.)

Yes and No. This is what we do throughout the RAPIDS code base and it is allowed by PEP-484. However, I see now that PEP-484 has been changed to recommend the use of Optional always.

Personally, I much prefer the old way. I think adding Optional to the vast majority of optional arguments bloats the code unnecessary.

I don't think so, @vyasr do you know?

Nope, it should be fine. There is a remote possibility that something in dask_cudf or dask-cuda is relying on the current behavior, but in either case we own those and can push the fixes downstream if we need to. That said, definitely agree with putting that fix in a subsequent PR.

Personally, I much prefer the old way. I think adding Optional to the vast majority of optional arguments bloats the code unnecessary.

I prefer having a style guide telling everybody what to do ;) I think we are currently very inconsistent in how we document optional parameters, but we may as well stick to what PEP 484 currently recommends since there's no telling when that will become a mypy error.

I prefer having a style guide telling everybody what to do ;) I think we are currently very inconsistent in how we document optional parameters, but we may as well stick to what PEP 484 currently recommends since there's no telling when that will become a mypy error.

Fair enough :) Maybe we should do a type hint overhauling at some point.

@madsbk Can you also discuss why you changed Any to object? The difference is subtle but significant. I see it discussed on StackOverflow but I'd be interested to hear your rationale.

Using object, we assume that owner has the same attributes as object and nothing else. Thus, mypy will complain if we do something like len(owner). This is exactly what we want since Buffer only uses owner to keep a reference to the underlying device memory alive.
On the other hand when using Any we assume that owner has any attribute thus mypy will accept anything including len(owner).

Fantastic! Thanks for that explanation, that aligns with my understanding as well.

bdice · 2022-05-17T17:32:46Z

python/cudf/cudf/core/buffer.py

@@ -72,23 +71,48 @@ def __init__(
                raise TypeError("data must be Buffer, array-like or integer")
            self._init_from_array_like(np.asarray(data), owner)

+    @classmethod
+    def from_buffer(cls, buffer: Buffer, size: int = None, offset: int = 0):


We could consider a slice API instead of a classmethod factory here. It might be more Pythonic?

Before:

buf = Buffer.from_buffer( buffer=codes_column.base_data, size=self.size * codes_column.dtype.itemsize, offset=self.offset * codes_column.dtype.itemsize, )

After:

start_bytes = self.offset * codes_column.dtype.itemsize end_bytes = start_bytes + self.size * codes_column.dtype.itemsize buf = codes_column.base_data[start_bytes:end_bytes]

Suggested implementation:

def __getitem__(self, key): if not isinstance(key, slice): raise ValueError("Buffers can be sliced but individual items cannot be accessed.") start, stop, step = key.start, key.stop, key.step if step != 1: raise ValueError("A Buffer can only be sliced with step = 1.") if start is None: start = 0 if stop is None: stop = len(self) if start < 0: start += len(self) if stop < 0: stop += len(self) if stop > len(self) or start < 0: raise ValueError("The slice start and stop must be in bounds.") if start > stop: raise ValueError("The slice start cannot be greater than the slice stop.") ret = cls() ret._ptr = self._ptr + start ret.size = stop - start ret._owner = self return ret

bdice · 2022-05-17T17:35:28Z

python/cudf/cudf/core/buffer.py

+
+        ret = cls()
+        ret._ptr = buffer._ptr + offset
+        ret.size = buffer.size if size is None else size


I would consider requiring size is not None if offset is provided (or doing bounds checks).

Buffer.from_buffer(buf, offset=32) will create a buffer with the same size as buf, but its end points out of bounds for the original buffer. As a result, it's slightly easier to misuse this API than a slice-based API like buf[32:].

vyasr · 2022-05-17T18:14:17Z

I have some high-level suggestions that may be beyond the scope of this PR. The most actionable suggestion I have for this PR is to make _size private. Beyond that, it might be good for us to consider the design of the class's constructor and factories as a whole. I don't know if we're locked into a specific API by external users or matching other packages, but we might be able to make this entire class less awkward.

There have been lots of discussions about the overall purpose, design, and appropriate home for this class. rapidsai/rmm#227 is quite relevant. As you pointed out, most of these discussions are beyond the scope of this PR, so I would recommend that we not pursue them too much further here, but I'm happy to continue more discussions offline. I think keeping this PR narrow would be helpful to let Mads' work on spilling proceed without blockers.

…tr-read-only

bdice

Two small change requests and one API question about size and nbytes being synonyms. Otherwise LGTM. This is now on my radar to make a follow-up PR that can improve these APIs.

python/cudf/cudf/core/buffer.py

Co-authored-by: Bradley Dice <[email protected]>

madsbk · 2022-05-18T19:51:59Z

Two small change requests and one API question about size and nbytes being synonyms. Otherwise LGTM. This is now on my radar to make a follow-up PR that can improve these APIs.

Thanks @bdice.
As part of the spilling work, I have to make more changes to Buffer or alternatively introduce a new kind of Buffer so please wait a bit with a follow-up PR :)

bdice · 2022-05-18T21:13:17Z

Sounds good @madsbk. Thanks again for your patience and explanation.

…tr-read-only

madsbk · 2022-05-25T07:47:35Z

@vyasr, I think this is ready to be merged

shwina · 2022-05-25T11:52:58Z

@madsbk I don't foresee any issues from merging this PR. But we are technically in code freeze for 22.06, and this PR touches some pretty deep internals. Can we target 22.08 instead? I'll merge once it's retargeted.

madsbk · 2022-05-25T15:26:18Z

Sure, that is probably a good idea :)

…

On Wed, 25 May 2022 at 13.53, Ashwin Srinath ***@***.***> wrote: @madsbk <https://github.com/madsbk> I don't foresee any issues from merging this PR. But we are technically in code freeze for 22.06, and this PR touches some pretty deep internals. Can we target 22.08 instead? I'll merge once it's retargeted. — Reply to this email directly, view it on GitHub <#10872 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAH6FQHSIOJFARGWX2IWUJTVLYICJANCNFSM5WEAOXNA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

shwina

Approving & merging.

shwina · 2022-05-25T19:14:40Z

@gpucibot merge

madsbk · 2022-05-26T06:56:31Z

Thanks everyone!

Buffer: make .ptr read-only

86468cf

madsbk added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function breaking Breaking change labels May 17, 2022

github-actions bot added the Python Affects Python cuDF API. label May 17, 2022

madsbk marked this pull request as ready for review May 17, 2022 14:37

madsbk requested a review from a team as a code owner May 17, 2022 14:37

madsbk requested review from vyasr and bdice May 17, 2022 14:37

madsbk added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels May 17, 2022

madsbk added 2 commits May 17, 2022 17:04

clean up

728a699

more clean up

14c2de6

shwina reviewed May 17, 2022

View reviewed changes

python/cudf/cudf/_lib/column.pyx Show resolved Hide resolved

bdice reviewed May 17, 2022

View reviewed changes

madsbk added 2 commits May 18, 2022 16:43

Merge branch 'branch-22.06' of github.com:rapidsai/cudf into buffer-p…

6a7dcc7

…tr-read-only

Making Buffer.size read-only

366dac8

bdice approved these changes May 18, 2022

View reviewed changes

python/cudf/cudf/core/buffer.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/buffer.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/buffer.py Show resolved Hide resolved

madsbk and others added 4 commits May 18, 2022 21:00

If data spans all of base_data, we return base_data.

db93636

Fix order of assignment

a636fa5

Co-authored-by: Bradley Dice <[email protected]>

Fix order of assignment

9584037

Co-authored-by: Bradley Dice <[email protected]>

more reordering

d5256c6

Merge branch 'branch-22.06' of github.com:rapidsai/cudf into buffer-p…

8e2fb90

…tr-read-only

clean up

ec69d75

shwina changed the base branch from branch-22.06 to branch-22.08 May 25, 2022 16:41

shwina approved these changes May 25, 2022

View reviewed changes

rapids-bot bot merged commit ca4bc97 into rapidsai:branch-22.08 May 25, 2022

madsbk mentioned this pull request Aug 10, 2022

[REVIEW] Refactor the Buffer class #11447

Merged

3 tasks

madsbk deleted the buffer-ptr-read-only branch November 21, 2022 08:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer: make `.ptr` read-only #10872

Buffer: make `.ptr` read-only #10872

madsbk commented May 17, 2022 •

edited

Loading

codecov bot commented May 17, 2022 •

edited

Loading

bdice left a comment

bdice May 17, 2022

bdice May 17, 2022

madsbk May 18, 2022 •

edited

Loading

vyasr May 18, 2022

madsbk May 18, 2022

bdice May 18, 2022 •

edited

Loading

madsbk May 18, 2022 •

edited

Loading

bdice May 18, 2022

bdice May 17, 2022 •

edited

Loading

bdice May 17, 2022 •

edited

Loading

vyasr commented May 17, 2022

bdice left a comment •

edited

Loading

madsbk commented May 18, 2022

bdice commented May 18, 2022

madsbk commented May 25, 2022

shwina commented May 25, 2022

madsbk commented May 25, 2022 via email

shwina left a comment

shwina commented May 25, 2022

madsbk commented May 26, 2022

Buffer: make .ptr read-only #10872

Buffer: make .ptr read-only #10872

Conversation

madsbk commented May 17, 2022 • edited Loading

codecov bot commented May 17, 2022 • edited Loading

Codecov Report

bdice left a comment

Choose a reason for hiding this comment

bdice May 17, 2022

Choose a reason for hiding this comment

bdice May 17, 2022

Choose a reason for hiding this comment

madsbk May 18, 2022 • edited Loading

Choose a reason for hiding this comment

vyasr May 18, 2022

Choose a reason for hiding this comment

madsbk May 18, 2022

Choose a reason for hiding this comment

bdice May 18, 2022 • edited Loading

Choose a reason for hiding this comment

madsbk May 18, 2022 • edited Loading

Choose a reason for hiding this comment

bdice May 18, 2022

Choose a reason for hiding this comment

bdice May 17, 2022 • edited Loading

Choose a reason for hiding this comment

bdice May 17, 2022 • edited Loading

Choose a reason for hiding this comment

vyasr commented May 17, 2022

bdice left a comment • edited Loading

Choose a reason for hiding this comment

madsbk commented May 18, 2022

bdice commented May 18, 2022

madsbk commented May 25, 2022

shwina commented May 25, 2022

madsbk commented May 25, 2022 via email

shwina left a comment

Choose a reason for hiding this comment

shwina commented May 25, 2022

madsbk commented May 26, 2022

Buffer: make `.ptr` read-only #10872

Buffer: make `.ptr` read-only #10872

madsbk commented May 17, 2022 •

edited

Loading

codecov bot commented May 17, 2022 •

edited

Loading

madsbk May 18, 2022 •

edited

Loading

bdice May 18, 2022 •

edited

Loading

madsbk May 18, 2022 •

edited

Loading

bdice May 17, 2022 •

edited

Loading

bdice May 17, 2022 •

edited

Loading

bdice left a comment •

edited

Loading