-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove cudf._lib.replace in favor of inlining pylibcudf #17428
Conversation
plc_replace, | ||
) | ||
result = type(self).from_pylibcudf(plc_column) | ||
return result._with_type_metadata(self.dtype) # type: ignore[return-value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why we need to ignore type check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because fillna
claims to return a Self
, but _with_type_metadata
only returns a ColumnBase
(because sometimes if you pass a different dtype in, the column type can change). Because return types are contravariant, in this scenario, Self
is not subclass of ColumnBase
.
self, values_to_replace: ColumnBase, replacement_values: ColumnBase | ||
) -> ColumnBase: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self, values_to_replace: ColumnBase, replacement_values: ColumnBase | |
) -> ColumnBase: | |
self, values_to_replace: Self, replacement_values: Self | |
) -> Self: |
The values to replace replace and the replacements must have the same type as Self, and we get a Self back.
@@ -241,8 +241,14 @@ def find_and_replace( | |||
) -> Self: | |||
raise NotImplementedError | |||
|
|||
@acquire_spill_lock() | |||
def clip(self, lo: ScalarLike, hi: ScalarLike) -> ColumnBase: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def clip(self, lo: ScalarLike, hi: ScalarLike) -> ColumnBase: | |
def clip(self, lo: ScalarLike, hi: ScalarLike) -> Self: |
plc_replace, | ||
) | ||
result = type(self).from_pylibcudf(plc_column) | ||
return result._with_type_metadata(self.dtype) # type: ignore[return-value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because fillna
claims to return a Self
, but _with_type_metadata
only returns a ColumnBase
(because sometimes if you pass a different dtype in, the column type can change). Because return types are contravariant, in this scenario, Self
is not subclass of ColumnBase
.
return libcudf.replace.replace( | ||
replaced, df._data["old"], df._data["new"] | ||
) | ||
return replaced.replace(df._data["old"], df._data["new"]) # type: ignore[return-value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one can go away if we type replace
correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to remove this, but I had to add a cast
in an astype
above (which is not able to always return self so it's typed as ColumnBase
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AH, thanks
@@ -6044,7 +6044,7 @@ def find_and_replace( | |||
df = df.dropna(subset=["old"]) | |||
else: | |||
res = self | |||
return libcudf.replace.replace(res, df._data["old"], df._data["new"]) | |||
return res.replace(df._data["old"], df._data["new"]) # type: ignore[return-value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here
return res.replace(df._data["old"], df._data["new"]) # type: ignore[return-value] | |
return res.replace(df._data["old"], df._data["new"]) |
plc_column = plc.replace.clamp( | ||
self.to_pylibcudf(mode="read"), | ||
cudf.Scalar(lo, self.dtype).device_value.c_value, | ||
cudf.Scalar(hi, self.dtype).device_value.c_value, | ||
) | ||
return type(self).from_pylibcudf(plc_column) | ||
return type(self).from_pylibcudf(plc_column) # type: ignore[return-value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is because Column.from_pylibcudf
only advertises that it produces ColumnBase
rather than Self
? All this typing is kind of weird because the C++ column is type-erased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup exactly. We have this occurrences in quite a few places in cudf Python since we know what column subclass should be returned.
/merge |
Description
Contributes to #17317
Checklist