[REVIEW] Update `copy-on-write` with `branch-23.02` changes #12556

galipremsagar · 2023-01-17T16:21:02Z

Description

This PR updates the copy-on-write branch with get_ptr & data_array_view APIs.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

codecov · 2023-01-17T23:11:37Z

Codecov Report

❗ No coverage uploaded for pull request base (copy-on-write@fa094ed). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head dbfb8d4 differs from pull request most recent head 376e1a1. Consider uploading reports for the commit 376e1a1 to get more accurate results

Additional details and impacted files

@@               Coverage Diff                @@
##             copy-on-write   #12556   +/-   ##
================================================
  Coverage                 ?   43.53%           
================================================
  Files                    ?      156           
  Lines                    ?    24730           
  Branches                 ?        0           
================================================
  Hits                     ?    10766           
  Misses                   ?    13964           
  Partials                 ?        0

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

wence-

Thanks, looks like a nice improvement. As a consequence, I have a few queries about places where I think we don't need to access Column arrays in write mode (and some where I think we can bypass creating a column at all, since we do so just to immediately turn into a cupy array).

wence- · 2023-01-18T10:00:18Z

python/cudf/cudf/_lib/column.pyx

                value = value._get_readonly_proxy_obj
            if value.__cuda_array_interface__["typestr"] not in ("|i1", "|u1"):
                if isinstance(value, Column):
-                    value = value.data_array_view
+                    value = value.data_array_view(mode="write")
                value = cp.asarray(value).view('|u1')


Could we now simplify this bit to just say:

if isinstance(value, Column): value = value.data_array_view(mode="write") if hasattr(value, "__cuda_array_interface__"): value = cp.asarray(value).view("|u1") ...

TBH, looking at this now, I am confused why the dance with inspect to avoid a copy in the CoW case, since if we are in CoW mode, we will need to trigger that copy, no?

Looking into it..

wence- · 2023-01-18T10:04:28Z

python/cudf/cudf/_lib/column.pyx

+                    # spillable) and that its pointer is the same
+                    # as `data_ptr` _without_ exposing the buffer
+                    # permanently (calling get_ptr with a
+                    # dummy SpillLock).


comment needs updated.

These will be updated once #12564 is merged

wence- · 2023-01-18T10:04:33Z

python/cudf/cudf/_lib/column.pyx

+                        exposed=True,
+                    )
+                    if isinstance(data_owner, CopyOnWriteBuffer):
+                        data_owner.__cuda_array_interface__()


Why is this a function call? I though __cuda_array_interface__ was a property?

Also, is there a way we can make this "exposed by accessing a property" idea more obvious by just introducing a buffer._expose() method? WDYT?

wence- · 2023-01-18T10:06:56Z

python/cudf/cudf/core/buffer/buffer.py

-    @property
-    def mutable_ptr(self) -> int:
-        """Device pointer to the start of the buffer."""
+    def get_ptr(self, mode: str = "read") -> int:


Would prefer no default, or if we must have one, default to slow but safe (i.e. "write") rather than "read".

Add docstring please.

wence- · 2023-01-18T10:08:04Z

python/cudf/cudf/core/buffer/cow_buffer.py

-        # mutable views.
-        self._unlink_shared_buffers()
-        return self._ptr
+    def get_ptr(self, mode: str = "read") -> int:


Docstring for this function please.

python/cudf/cudf/tests/test_dataframe.py

wence- · 2023-01-18T11:11:11Z

python/cudf/cudf/tests/test_spilling.py

@@ -519,6 +519,7 @@ def test_get_rmm_memory_resource_stack():
 def test_df_transpose(manager: SpillManager):
    df1 = cudf.DataFrame({"a": [1, 2]})
    df2 = df1.transpose()
+    # import pdb;pdb.set_trace()


Remove commented code.

python/cudf/cudf/utils/applyutils.py

wence- · 2023-01-18T11:13:54Z

python/cudf/cudf/utils/applyutils.py

@@ -174,7 +177,7 @@ def run(self, df, **launch_params):
            )
            if out_mask is not None:
                outdf._data[k] = outdf[k]._column.set_mask(
-                    out_mask.data_array_view
+                    out_mask.data_array_view(mode="write")


So out_mask is just created, so this is just an indication that we are taking ownership of this object, I guess.

wence- · 2023-01-18T11:14:45Z

python/cudf/cudf/utils/applyutils.py

        else:
            # *chunks* is an array of chunk leading offset
            chunks = column.as_column(chunks)
-            return chunks.data_array_view
+            return chunks.data_array_view(mode="write")


Could we avoid going through a Column and just directly make cupy arrays in these two cases?

Yup, made the change to use cupy arrays directly.

Co-authored-by: Lawrence Mitchell <[email protected]>

Co-authored-by: Mads R. B. Kristensen <[email protected]>

Co-authored-by: Lawrence Mitchell <[email protected]>

…into ptr_refactor_1

Co-authored-by: Mads R. B. Kristensen <[email protected]>

Co-authored-by: Vyas Ramasubramani <[email protected]>

Co-authored-by: Lawrence Mitchell <[email protected]>

python/strings_udf/strings_udf/tests/test_string_udfs.py

github-actions bot added the Python Affects Python cuDF API. label Jan 17, 2023

wence- reviewed Jan 18, 2023

View reviewed changes

github-actions bot added CMake CMake build issue conda Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Jan 18, 2023

galipremsagar changed the base branch from copy-on-write to branch-23.02 January 18, 2023 20:09

galipremsagar changed the base branch from branch-23.02 to copy-on-write January 18, 2023 20:09

github-actions bot removed CMake CMake build issue conda libcudf Affects libcudf (C++/CUDA) code. Java Affects Java cuDF API. labels Jan 18, 2023

galipremsagar and others added 17 commits January 20, 2023 06:03

get_ptr & _array_view refactor

15cbedf

Apply suggestions from code review

3697367

Co-authored-by: Lawrence Mitchell <[email protected]>

address reviews

2b1b48d

Merge branch 'branch-23.02' into ptr_refactor_1

0bba2a5

Apply suggestions from code review

9824471

Co-authored-by: Mads R. B. Kristensen <[email protected]>

Merge remote-tracking branch 'upstream/branch-23.02' into ptr_refactor_1

7868437

drop internal_write

07a34b4

add docstring

afd3588

Apply suggestions from code review

91a9b60

Co-authored-by: Lawrence Mitchell <[email protected]>

address reviews

e53a5b7

Merge branch 'ptr_refactor_1' of https://github.com/galipremsagar/cudf …

360ce3f

…into ptr_refactor_1

address reviews

1c44bd2

add locks

8292ed7

Apply suggestions from code review

01e8883

Co-authored-by: Mads R. B. Kristensen <[email protected]>

Merge branch 'branch-23.02' into ptr_refactor_1

e399211

Apply suggestions from code review

04897ef

Co-authored-by: Vyas Ramasubramani <[email protected]>

make mode a required key-arg

8f0f017

galipremsagar and others added 4 commits January 24, 2023 08:57

rename to _readonly_proxy_cai_obj

af0e264

Merge branch 'branch-23.02' into ptr_refactor_1

f04c2a1

Update python/cudf/cudf/core/column/column.py

046025a

Co-authored-by: Lawrence Mitchell <[email protected]>

Sync get_ptr with cow

dbfb8d4

galipremsagar force-pushed the ptr_refactor branch from 8e40546 to dbfb8d4 Compare January 24, 2023 22:26

galipremsagar mentioned this pull request Jan 25, 2023

[REVIEW] Change ways to access ptr in Buffer #12587

Merged

5 tasks

galipremsagar added 3 commits January 26, 2023 06:12

merge

672df9e

revert

6ae60ec

fix

bb51855

galipremsagar changed the title ~~[WIP] Ptr refactor~~ [REVIEW] Update copy-on-write with branch-23.02 changes Jan 26, 2023

galipremsagar marked this pull request as ready for review January 26, 2023 15:25

galipremsagar requested a review from a team as a code owner January 26, 2023 15:25

galipremsagar requested review from shwina and charlesbluca and removed request for a team January 26, 2023 15:25

galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 26, 2023

galipremsagar commented Jan 26, 2023

View reviewed changes

python/strings_udf/strings_udf/tests/test_string_udfs.py Outdated Show resolved Hide resolved

galipremsagar commented Jan 26, 2023

View reviewed changes

python/strings_udf/strings_udf/tests/test_string_udfs.py Outdated Show resolved Hide resolved

Apply suggestions from code review

376e1a1

galipremsagar merged commit a857ad9 into rapidsai:copy-on-write Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Update `copy-on-write` with `branch-23.02` changes #12556

[REVIEW] Update `copy-on-write` with `branch-23.02` changes #12556

galipremsagar commented Jan 17, 2023 •

edited

Loading

codecov bot commented Jan 17, 2023 •

edited

Loading

wence- left a comment

wence- Jan 18, 2023

galipremsagar Jan 19, 2023

wence- Jan 18, 2023

galipremsagar Jan 19, 2023

wence- Jan 18, 2023

wence- Jan 18, 2023

wence- Jan 18, 2023

wence- Jan 18, 2023

wence- Jan 18, 2023

wence- Jan 18, 2023

wence- Jan 18, 2023

galipremsagar Jan 19, 2023

wence- Jan 18, 2023

galipremsagar Jan 19, 2023

[REVIEW] Update copy-on-write with branch-23.02 changes #12556

[REVIEW] Update copy-on-write with branch-23.02 changes #12556

Conversation

galipremsagar commented Jan 17, 2023 • edited Loading

Description

Checklist

codecov bot commented Jan 17, 2023 • edited Loading

Codecov Report

wence- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[REVIEW] Update `copy-on-write` with `branch-23.02` changes #12556

[REVIEW] Update `copy-on-write` with `branch-23.02` changes #12556

galipremsagar commented Jan 17, 2023 •

edited

Loading

codecov bot commented Jan 17, 2023 •

edited

Loading