Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cudf._lib.replace in favor of inlining pylibcudf #17428

Merged
merged 4 commits into from
Dec 2, 2024

Conversation

mroeschke
Copy link
Contributor

Description

Contributes to #17317

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 23, 2024
@mroeschke mroeschke self-assigned this Nov 23, 2024
@mroeschke mroeschke requested a review from a team as a code owner November 23, 2024 01:22
@mroeschke mroeschke requested review from wence- and Matt711 November 23, 2024 01:22
@github-actions github-actions bot added the CMake CMake build issue label Nov 23, 2024
plc_replace,
)
result = type(self).from_pylibcudf(plc_column)
return result._with_type_metadata(self.dtype) # type: ignore[return-value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why we need to ignore type check here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because fillna claims to return a Self, but _with_type_metadata only returns a ColumnBase (because sometimes if you pass a different dtype in, the column type can change). Because return types are contravariant, in this scenario, Self is not subclass of ColumnBase.

Comment on lines 697 to 698
self, values_to_replace: ColumnBase, replacement_values: ColumnBase
) -> ColumnBase:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self, values_to_replace: ColumnBase, replacement_values: ColumnBase
) -> ColumnBase:
self, values_to_replace: Self, replacement_values: Self
) -> Self:

The values to replace replace and the replacements must have the same type as Self, and we get a Self back.

@@ -241,8 +241,14 @@ def find_and_replace(
) -> Self:
raise NotImplementedError

@acquire_spill_lock()
def clip(self, lo: ScalarLike, hi: ScalarLike) -> ColumnBase:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def clip(self, lo: ScalarLike, hi: ScalarLike) -> ColumnBase:
def clip(self, lo: ScalarLike, hi: ScalarLike) -> Self:

plc_replace,
)
result = type(self).from_pylibcudf(plc_column)
return result._with_type_metadata(self.dtype) # type: ignore[return-value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because fillna claims to return a Self, but _with_type_metadata only returns a ColumnBase (because sometimes if you pass a different dtype in, the column type can change). Because return types are contravariant, in this scenario, Self is not subclass of ColumnBase.

return libcudf.replace.replace(
replaced, df._data["old"], df._data["new"]
)
return replaced.replace(df._data["old"], df._data["new"]) # type: ignore[return-value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one can go away if we type replace correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove this, but I had to add a cast in an astype above (which is not able to always return self so it's typed as ColumnBase)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AH, thanks

@@ -6044,7 +6044,7 @@ def find_and_replace(
df = df.dropna(subset=["old"])
else:
res = self
return libcudf.replace.replace(res, df._data["old"], df._data["new"])
return res.replace(df._data["old"], df._data["new"]) # type: ignore[return-value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here

Suggested change
return res.replace(df._data["old"], df._data["new"]) # type: ignore[return-value]
return res.replace(df._data["old"], df._data["new"])

plc_column = plc.replace.clamp(
self.to_pylibcudf(mode="read"),
cudf.Scalar(lo, self.dtype).device_value.c_value,
cudf.Scalar(hi, self.dtype).device_value.c_value,
)
return type(self).from_pylibcudf(plc_column)
return type(self).from_pylibcudf(plc_column) # type: ignore[return-value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is because Column.from_pylibcudf only advertises that it produces ColumnBase rather than Self? All this typing is kind of weird because the C++ column is type-erased.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup exactly. We have this occurrences in quite a few places in cudf Python since we know what column subclass should be returned.

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 6884867 into rapidsai:branch-25.02 Dec 2, 2024
105 checks passed
@mroeschke mroeschke deleted the cudf/_lib/replace branch December 2, 2024 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants