Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Update copy-on-write with branch-23.02 changes #12556

Merged
merged 25 commits into from
Jan 26, 2023

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Jan 17, 2023

Description

This PR updates the copy-on-write branch with get_ptr & data_array_view APIs.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the Python Affects Python cuDF API. label Jan 17, 2023
@codecov
Copy link

codecov bot commented Jan 17, 2023

Codecov Report

❗ No coverage uploaded for pull request base (copy-on-write@fa094ed). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head dbfb8d4 differs from pull request most recent head 376e1a1. Consider uploading reports for the commit 376e1a1 to get more accurate results

Additional details and impacted files
@@               Coverage Diff                @@
##             copy-on-write   #12556   +/-   ##
================================================
  Coverage                 ?   43.53%           
================================================
  Files                    ?      156           
  Lines                    ?    24730           
  Branches                 ?        0           
================================================
  Hits                     ?    10766           
  Misses                   ?    13964           
  Partials                 ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks like a nice improvement. As a consequence, I have a few queries about places where I think we don't need to access Column arrays in write mode (and some where I think we can bypass creating a column at all, since we do so just to immediately turn into a cupy array).

Comment on lines 214 to 218
value = value._get_readonly_proxy_obj
if value.__cuda_array_interface__["typestr"] not in ("|i1", "|u1"):
if isinstance(value, Column):
value = value.data_array_view
value = value.data_array_view(mode="write")
value = cp.asarray(value).view('|u1')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we now simplify this bit to just say:

if isinstance(value, Column):
    value = value.data_array_view(mode="write")
if hasattr(value, "__cuda_array_interface__"):
   value = cp.asarray(value).view("|u1")
...

TBH, looking at this now, I am confused why the dance with inspect to avoid a copy in the CoW case, since if we are in CoW mode, we will need to trigger that copy, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into it..

# spillable) and that its pointer is the same
# as `data_ptr` _without_ exposing the buffer
# permanently (calling get_ptr with a
# dummy SpillLock).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment needs updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be updated once #12564 is merged

exposed=True,
)
if isinstance(data_owner, CopyOnWriteBuffer):
data_owner.__cuda_array_interface__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a function call? I though __cuda_array_interface__ was a property?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there a way we can make this "exposed by accessing a property" idea more obvious by just introducing a buffer._expose() method? WDYT?

@property
def mutable_ptr(self) -> int:
"""Device pointer to the start of the buffer."""
def get_ptr(self, mode: str = "read") -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer no default, or if we must have one, default to slow but safe (i.e. "write") rather than "read".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring please.

# mutable views.
self._unlink_shared_buffers()
return self._ptr
def get_ptr(self, mode: str = "read") -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring for this function please.

python/cudf/cudf/tests/test_dataframe.py Outdated Show resolved Hide resolved
@@ -519,6 +519,7 @@ def test_get_rmm_memory_resource_stack():
def test_df_transpose(manager: SpillManager):
df1 = cudf.DataFrame({"a": [1, 2]})
df2 = df1.transpose()
# import pdb;pdb.set_trace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code.

python/cudf/cudf/utils/applyutils.py Outdated Show resolved Hide resolved
@@ -174,7 +177,7 @@ def run(self, df, **launch_params):
)
if out_mask is not None:
outdf._data[k] = outdf[k]._column.set_mask(
out_mask.data_array_view
out_mask.data_array_view(mode="write")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So out_mask is just created, so this is just an indication that we are taking ownership of this object, I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

else:
# *chunks* is an array of chunk leading offset
chunks = column.as_column(chunks)
return chunks.data_array_view
return chunks.data_array_view(mode="write")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid going through a Column and just directly make cupy arrays in these two cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, made the change to use cupy arrays directly.

@github-actions github-actions bot added CMake CMake build issue conda Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Jan 18, 2023
@galipremsagar galipremsagar changed the base branch from copy-on-write to branch-23.02 January 18, 2023 20:09
@galipremsagar galipremsagar changed the base branch from branch-23.02 to copy-on-write January 18, 2023 20:09
@github-actions github-actions bot removed CMake CMake build issue conda libcudf Affects libcudf (C++/CUDA) code. Java Affects Java cuDF API. labels Jan 18, 2023
@galipremsagar galipremsagar changed the title [WIP] Ptr refactor [REVIEW] Update copy-on-write with branch-23.02 changes Jan 26, 2023
@galipremsagar galipremsagar marked this pull request as ready for review January 26, 2023 15:25
@galipremsagar galipremsagar requested a review from a team as a code owner January 26, 2023 15:25
@galipremsagar galipremsagar requested review from shwina and charlesbluca and removed request for a team January 26, 2023 15:25
@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 26, 2023
@galipremsagar galipremsagar merged commit a857ad9 into rapidsai:copy-on-write Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants