Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: EA.fillna copy=True #53728

Merged
merged 19 commits into from
Jul 26, 2023
Merged

ENH: EA.fillna copy=True #53728

merged 19 commits into from
Jul 26, 2023

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Jun 19, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

cc @phofl for comments on the CoW handling in Block code
cc @jorisvandenbossche for thoughts on deprecation path for 3rd party EAs

Edit: also worth discussing the "author's discretion" on respecting copy=False.

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of comments

@@ -802,12 +809,20 @@ def fillna(
npvalues = self.astype(object)
func(npvalues, limit=limit, mask=mask)
new_values = self._from_sequence(npvalues, dtype=self.dtype)
if not copy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this, this is actually slower than copy=True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed this is not good. would you prefer to just ignore copy in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

new_values[mask] = value
else:
new_values = self.copy()
if not copy:
new_values = self[:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a shallow copy in these cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i figured on principle to return a new object, doesnt make much difference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this compare with other EA methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see a couple of return self and a couple of return self[:] in the base class methods. so not too consistent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care then, but we should make this consistent (not in this pr)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im leaning towards consistently returning self as in the Block method we may want to determine whether a copy was made and result is not values is an easy way to do that.

pandas/core/internals/blocks.py Outdated Show resolved Hide resolved
pandas/tests/copy_view/test_interp_fillna.py Outdated Show resolved Hide resolved
@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate ExtensionArray Extending pandas with custom dtypes or arrays. labels Jun 20, 2023
@jbrockmendel
Copy link
Member Author

Updated to ignore "copy" in the base class fillna method in ffill/bfill cases.

Added a warning+test for 3rd party EAs that dont yet have the copy keyword in the signature.

@jbrockmendel
Copy link
Member Author

i think comments have been addressed

@phofl
Copy link
Member

phofl commented Jul 25, 2023

Sorry I thought that we had already merged this. Will merge when CI is green

@phofl phofl merged commit edc0870 into pandas-dev:main Jul 26, 2023
@phofl
Copy link
Member

phofl commented Jul 26, 2023

thx @jbrockmendel

@phofl phofl added this to the 2.1 milestone Jul 26, 2023
@jbrockmendel jbrockmendel deleted the enh-fillna-copy branch July 26, 2023 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants