Define Column.nan_as_null to return self #15923

mroeschke · 2024-06-05T00:04:57Z

Description

While trying to clean all the fillna logic, I needed to have a Column.nan_as_null defined to make the fillna logic more re-useable.

This allows other nan_as_null usages in cudf to avoiding checking whether it's defined on the column or not.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

wence-

A question, but nice simplification!

wence- · 2024-06-05T09:40:30Z

python/cudf/cudf/core/column/column.py

@@ -1247,6 +1249,10 @@ def unary_operator(self, unaryop: str):
            f"Operation {unaryop} not supported for dtype {self.dtype}."
        )

+    def nans_to_nulls(self: Self) -> Self:
+        """Convert NaN to NA."""
+        return self


question: If the column is a floating point one, nans_to_nulls might (always?) produce a copy. Here, we would share data. Is that problematic for downstream consumers who might implicitly assume that nans_to_nulls copies?

I see you handle this explicitly in the one case it is necessary below.

Yeah I was wrestling back and forth between whether this should copy or not.

One one hand, as you mentioned, there was only 1 API where we needed to guarantee the result didn't share data.

For most other cases, we wanted this to no-op in the cases where no nan conversions would happen.

So I see it as avoid unnecessary copies by default vs the caller guards against calling nan_as_null if there are no nans to avoid unnecessary copies. WDYT?

For most other cases, we wanted this to no-op in the cases where no nan conversions would happen.

For this reason, I feel it's okay not to create copies and generate more memory pressure for this API.

galipremsagar · 2024-06-05T16:45:25Z

python/cudf/cudf/core/column/column.py

@@ -702,7 +702,9 @@ def fillna(
        Returns a copy with null filled.
        """
        return libcudf.replace.replace_nulls(
-            input_col=self, replacement=fill_value, method=method
+            input_col=self.nans_to_nulls(),


I think you fixed a bug that is resulting in an increase in test pass rate:

Would you be able to add a test having a mix of nan's & NA's performing a to_arrow ? Just wanting to make sure we have it captured in pytest so that we know if it breaks/someone changes this code unintentionally.

Would you be able to add a test having a mix of nan's & NA's performing a to_arrow ?

I assume you meant fillna instead of to_arrow? If so, I added that test in 0d74275

…null

mroeschke · 2024-06-07T17:30:28Z

/merge

Define Column.nan_as_null to return self

9d4b08c

mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 5, 2024

mroeschke requested a review from a team as a code owner June 5, 2024 00:04

mroeschke requested review from galipremsagar and Matt711 June 5, 2024 00:04

wence- approved these changes Jun 5, 2024

View reviewed changes

galipremsagar approved these changes Jun 5, 2024

View reviewed changes

galipremsagar reviewed Jun 5, 2024

View reviewed changes

mroeschke added 3 commits June 5, 2024 14:27

Merge remote-tracking branch 'upstream/branch-24.08' into ref/nan_to_…

4ebcdfa

…null

Add test verifying copy

be66fde

Add fillna test with nan and null

0d74275

rapids-bot bot merged commit d83d086 into rapidsai:branch-24.08 Jun 7, 2024
72 checks passed

mroeschke deleted the ref/nan_to_null branch June 7, 2024 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define Column.nan_as_null to return self #15923

Define Column.nan_as_null to return self #15923

mroeschke commented Jun 5, 2024

wence- left a comment

wence- Jun 5, 2024

mroeschke Jun 5, 2024

galipremsagar Jun 5, 2024

galipremsagar Jun 5, 2024

mroeschke Jun 5, 2024

mroeschke commented Jun 7, 2024

Define Column.nan_as_null to return self #15923

Define Column.nan_as_null to return self #15923

Conversation

mroeschke commented Jun 5, 2024

Description

Checklist

wence- left a comment

Choose a reason for hiding this comment

wence- Jun 5, 2024

Choose a reason for hiding this comment

mroeschke Jun 5, 2024

Choose a reason for hiding this comment

galipremsagar Jun 5, 2024

Choose a reason for hiding this comment

galipremsagar Jun 5, 2024

Choose a reason for hiding this comment

mroeschke Jun 5, 2024

Choose a reason for hiding this comment

mroeschke commented Jun 7, 2024