-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Update copy-on-write
with branch-23.02
changes
#12556
[REVIEW] Update copy-on-write
with branch-23.02
changes
#12556
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## copy-on-write #12556 +/- ##
================================================
Coverage ? 43.53%
================================================
Files ? 156
Lines ? 24730
Branches ? 0
================================================
Hits ? 10766
Misses ? 13964
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks like a nice improvement. As a consequence, I have a few queries about places where I think we don't need to access Column
arrays in write
mode (and some where I think we can bypass creating a column at all, since we do so just to immediately turn into a cupy array).
python/cudf/cudf/_lib/column.pyx
Outdated
value = value._get_readonly_proxy_obj | ||
if value.__cuda_array_interface__["typestr"] not in ("|i1", "|u1"): | ||
if isinstance(value, Column): | ||
value = value.data_array_view | ||
value = value.data_array_view(mode="write") | ||
value = cp.asarray(value).view('|u1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we now simplify this bit to just say:
if isinstance(value, Column):
value = value.data_array_view(mode="write")
if hasattr(value, "__cuda_array_interface__"):
value = cp.asarray(value).view("|u1")
...
TBH, looking at this now, I am confused why the dance with inspect
to avoid a copy in the CoW case, since if we are in CoW mode, we will need to trigger that copy, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into it..
python/cudf/cudf/_lib/column.pyx
Outdated
# spillable) and that its pointer is the same | ||
# as `data_ptr` _without_ exposing the buffer | ||
# permanently (calling get_ptr with a | ||
# dummy SpillLock). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment needs updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will be updated once #12564 is merged
python/cudf/cudf/_lib/column.pyx
Outdated
exposed=True, | ||
) | ||
if isinstance(data_owner, CopyOnWriteBuffer): | ||
data_owner.__cuda_array_interface__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a function call? I though __cuda_array_interface__
was a property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is there a way we can make this "exposed by accessing a property" idea more obvious by just introducing a buffer._expose()
method? WDYT?
@property | ||
def mutable_ptr(self) -> int: | ||
"""Device pointer to the start of the buffer.""" | ||
def get_ptr(self, mode: str = "read") -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer no default, or if we must have one, default to slow but safe (i.e. "write") rather than "read".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring please.
# mutable views. | ||
self._unlink_shared_buffers() | ||
return self._ptr | ||
def get_ptr(self, mode: str = "read") -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring for this function please.
@@ -519,6 +519,7 @@ def test_get_rmm_memory_resource_stack(): | |||
def test_df_transpose(manager: SpillManager): | |||
df1 = cudf.DataFrame({"a": [1, 2]}) | |||
df2 = df1.transpose() | |||
# import pdb;pdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code.
python/cudf/cudf/utils/applyutils.py
Outdated
@@ -174,7 +177,7 @@ def run(self, df, **launch_params): | |||
) | |||
if out_mask is not None: | |||
outdf._data[k] = outdf[k]._column.set_mask( | |||
out_mask.data_array_view | |||
out_mask.data_array_view(mode="write") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So out_mask
is just created, so this is just an indication that we are taking ownership of this object, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right
python/cudf/cudf/utils/applyutils.py
Outdated
else: | ||
# *chunks* is an array of chunk leading offset | ||
chunks = column.as_column(chunks) | ||
return chunks.data_array_view | ||
return chunks.data_array_view(mode="write") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we avoid going through a Column
and just directly make cupy arrays in these two cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, made the change to use cupy arrays directly.
Co-authored-by: Lawrence Mitchell <[email protected]>
Co-authored-by: Mads R. B. Kristensen <[email protected]>
Co-authored-by: Lawrence Mitchell <[email protected]>
…into ptr_refactor_1
Co-authored-by: Mads R. B. Kristensen <[email protected]>
Co-authored-by: Vyas Ramasubramani <[email protected]>
8e40546
to
dbfb8d4
Compare
copy-on-write
with branch-23.02
changes
Description
This PR updates the
copy-on-write
branch withget_ptr
&data_array_view
APIs.Checklist