Migrate string replace.pxd to pylibcudf #15839

lithomas1 · 2024-05-23T16:11:27Z

Description

Change replace.pxd to use pylibcudf APIs.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

lithomas1 · 2024-05-23T16:11:53Z

python/cudf/cudf/_lib/pylibcudf/strings/CMakeLists.txt

 set(linked_libraries cudf::cudf)
 rapids_cython_create_modules(
  CXX
  SOURCE_FILES "${cython_sources}"
-  LINKED_LIBRARIES "${linked_libraries}" MODULE_PREFIX pylibcudf_ ASSOCIATED_TARGETS cudf
+  LINKED_LIBRARIES "${linked_libraries}" MODULE_PREFIX pylibcudf_strings ASSOCIATED_TARGETS cudf


renamed since the string replace.pyx clashes with the regular replace.pyx

lithomas1 · 2024-05-23T16:17:46Z

python/cudf/cudf/_lib/pylibcudf/strings/replace.pyx

+            ))
+    else:
+        # Column case
+        # TODO: maxrepl should be supported in the corresponding CUDA/C++ code


For the overload of replace in libcudf where input/target/repl are columns, there isn't a maxrepl arg.

We should probably support this in libcudf replace (eventually), otherwise we'll have some weirdness in pylibcudf where we'll have to raise for maxrepl despite accepting it as an argument.

Good idea. Can you raise an issue?

In the meantime, I would recommend that we change the default value of the parameter to None, then raise a NotImplementedError in this branch of the code if we find a non-None value, while in the Scalar branch we set it to -1.

lithomas1 · 2024-05-23T19:09:55Z

probably should go in after #15503

vyasr

Some very minor nits to pick, but overall looks great!

python/cudf/cudf/_lib/pylibcudf/strings/CMakeLists.txt

python/cudf/cudf/_lib/pylibcudf/strings/find.pxd

vyasr · 2024-05-24T00:17:32Z

python/cudf/cudf/_lib/pylibcudf/strings/replace.pyx

+            ))
+    else:
+        # Column case
+        # TODO: maxrepl should be supported in the corresponding CUDA/C++ code


Good idea. Can you raise an issue?

In the meantime, I would recommend that we change the default value of the parameter to None, then raise a NotImplementedError in this branch of the code if we find a non-None value, while in the Scalar branch we set it to -1.

python/cudf/cudf/_lib/pylibcudf/strings/replace.pyx

vyasr · 2024-05-24T00:22:32Z

python/cudf/cudf/pylibcudf_tests/test_string_replace.py

+@pytest.fixture(scope="module", params=["a", "c", "A", "Á", "aa", "ÁÁÁ"])
+def scalar_repl_target(request):
+    pa_target = pa.scalar(request.param, type=pa.string())
+    return (request.param, plc.interop.from_arrow(pa_target))


Interesting approach. This accomplishes the same thing in spirit that the other tests have done with a plc fixture that's constructed from the arrow fixture, just in a slightly different way. The more I stare at it, I'm starting to prefer this approach a little since it ties the two objects very tightly together in a way that reflects that they'll probably always be used together in tests. I suppose it's largely a matter of taste though. @mroeschke @brandon-b-miller WDYT? I would generally prefer to be consistent across tests; it's not a huge deal, but given that our test suite is still fairly small I'd be fine with a quick PR to standardize on this approach if we like it better (or change this to the separate fixture approach if we prefer that).

This is what we do in pandas.

The advantage of this approach is being able to pair the values together.
(which is good since you'll never have cases where the fixtures get out of sync)

The disadvantage is that you have to add a line unpacking the tuple in every test that uses the fixture.

I'm happy with either, just did it this way since it was more familiar for me.

I do think I prefer this approach. After this PR merges, could you make a follow-up that standardizes this in other tests? Basically just removing pairs of (pyarrow,pylibcudf) fixtures in favor of a single fixture returning the pair? You can also pair up the column fixtures in this test.

python/cudf/cudf/pylibcudf_tests/test_string_replace.py

vyasr · 2024-05-28T17:37:09Z

I'm holding off on a second review until #15503 is merged, but in the meantime could you add a short PR description please?

lithomas1 · 2024-05-31T15:56:15Z

Lets wait for #15898

vyasr · 2024-06-03T19:32:24Z

#15898 is merged

…f-replace

lithomas1 · 2024-06-03T22:05:49Z

OK, this is re-sync'ed and ready for re-review.

…f-replace

vyasr

Couple of small suggestions and a request to propagate the change for testing fixtures forward. Otherwise LGTM!

docs/cudf/source/user_guide/api_docs/pylibcudf/index.rst

python/cudf/cudf/_lib/pylibcudf/strings/replace.pyx

Co-authored-by: Vyas Ramasubramani <[email protected]>

lithomas1 · 2024-06-05T15:30:37Z

/merge

Condense all pa_foo/plc_foo data fixtures into just foo, as recommended by #15839 (comment). Authors: - Thomas Li (https://github.com/lithomas1) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #15958

Migrate string replace.pxd to pylibcudf

d49ed89

github-actions bot added Python Affects Python cuDF API. CMake CMake build issue labels May 23, 2024

lithomas1 commented May 23, 2024

View reviewed changes

add tests

397ba14

lithomas1 marked this pull request as ready for review May 23, 2024 19:00

lithomas1 requested a review from a team as a code owner May 23, 2024 19:00

lithomas1 requested review from galipremsagar and charlesbluca May 23, 2024 19:00

vyasr requested changes May 24, 2024

View reviewed changes

lithomas1 mentioned this pull request May 24, 2024

For the overload of replace in libcudf where input/target/repl are columns, there isn't a maxrepl arg. #15855

Closed

lithomas1 added non-breaking Non-breaking change feature request New feature or request labels May 24, 2024

lithomas1 and others added 3 commits May 24, 2024 10:50

update

4d3b40f

Merge branch 'branch-24.08' into pylibcudf-replace

4f20ea1

rest of feedback

0924bd2

lithomas1 mentioned this pull request May 24, 2024

[FEA] Implement all libcudf modules required by cuDF Python in pylibcudf #15162

Closed

vyasr added the pylibcudf Issues specific to the pylibcudf package label May 28, 2024

lithomas1 and others added 2 commits May 29, 2024 06:20

Merge branch 'branch-24.08' into pylibcudf-replace

186e408

add docstrings for replace

139a2e2

lithomas1 added the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 31, 2024

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcud…

e168f58

…f-replace

lithomas1 mentioned this pull request Jun 3, 2024

Migrate strings contains operations to pylibcudf #15880

Merged

change module name in strings

cac8be1

lithomas1 removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jun 3, 2024

lithomas1 and others added 3 commits June 4, 2024 13:51

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcud…

bc9771a

…f-replace

fix errors

4583f5c

Merge branch 'branch-24.08' into pylibcudf-replace

1f2e434

lithomas1 requested a review from vyasr June 4, 2024 20:56

vyasr approved these changes Jun 5, 2024

View reviewed changes

docs/cudf/source/user_guide/api_docs/pylibcudf/index.rst Outdated Show resolved Hide resolved

python/cudf/cudf/_lib/pylibcudf/strings/replace.pyx Outdated Show resolved Hide resolved

lithomas1 and others added 2 commits June 5, 2024 08:23

Apply suggestions from code review

b589e25

Co-authored-by: Vyas Ramasubramani <[email protected]>

small test adjustment

65ae3d8

rapids-bot bot merged commit db1b365 into rapidsai:branch-24.08 Jun 5, 2024
69 checks passed

lithomas1 deleted the pylibcudf-replace branch June 5, 2024 16:48

lithomas1 mentioned this pull request Jun 8, 2024

Condense pylibcudf data fixtures #15958

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate string replace.pxd to pylibcudf #15839

Migrate string replace.pxd to pylibcudf #15839

lithomas1 commented May 23, 2024 •

edited

Loading

lithomas1 May 23, 2024

lithomas1 May 23, 2024

vyasr May 24, 2024

lithomas1 May 24, 2024

lithomas1 commented May 23, 2024

vyasr left a comment

vyasr May 24, 2024

vyasr May 24, 2024

lithomas1 May 29, 2024

vyasr Jun 5, 2024 •

edited

Loading

lithomas1 Jun 5, 2024

vyasr commented May 28, 2024 •

edited

Loading

lithomas1 commented May 31, 2024

vyasr commented Jun 3, 2024

lithomas1 commented Jun 3, 2024

vyasr left a comment

lithomas1 commented Jun 5, 2024

Migrate string replace.pxd to pylibcudf #15839

Migrate string replace.pxd to pylibcudf #15839

Conversation

lithomas1 commented May 23, 2024 • edited Loading

Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lithomas1 commented May 23, 2024

vyasr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyasr Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyasr commented May 28, 2024 • edited Loading

lithomas1 commented May 31, 2024

vyasr commented Jun 3, 2024

lithomas1 commented Jun 3, 2024

vyasr left a comment

Choose a reason for hiding this comment

lithomas1 commented Jun 5, 2024

lithomas1 commented May 23, 2024 •

edited

Loading

vyasr Jun 5, 2024 •

edited

Loading

vyasr commented May 28, 2024 •

edited

Loading