Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Categorical.copy deep kwarg #27024

Closed
wants to merge 2 commits into from

Conversation

jbrockmendel
Copy link
Member

Would close #26995 if I hadn't just updated that to reflect the fact that several other pandas-internal EAs don't handle the deep kwarg correctly.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@codecov
Copy link

codecov bot commented Jun 24, 2019

Codecov Report

Merging #27024 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #27024      +/-   ##
==========================================
+ Coverage   91.99%      92%   +<.01%     
==========================================
  Files         180      180              
  Lines       50774    50760      -14     
==========================================
- Hits        46712    46703       -9     
+ Misses       4062     4057       -5
Flag Coverage Δ
#multiple 90.64% <100%> (+0.01%) ⬆️
#single 41.86% <56.25%> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/categorical.py 95.94% <100%> (+0.02%) ⬆️
pandas/core/internals/blocks.py 94.62% <100%> (+0.24%) ⬆️
pandas/core/internals/construction.py 95.95% <100%> (ø) ⬆️
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.89% <0%> (-0.12%) ⬇️
pandas/core/ops.py 94.66% <0%> (-0.03%) ⬇️
pandas/core/indexes/datetimelike.py 98.14% <0%> (-0.01%) ⬇️
pandas/io/formats/format.py 97.91% <0%> (ø) ⬆️
pandas/core/sorting.py 98.35% <0%> (ø) ⬆️
pandas/core/arrays/base.py 99.43% <0%> (ø) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83fe8d7...82fcd54. Read the comment docs.

@gfyoung gfyoung added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Categorical Categorical Data Type labels Jun 26, 2019
@@ -2295,7 +2298,7 @@ def unique(self):

# unlike np.unique, unique1d does not sort
unique_codes = unique1d(self.codes)
cat = self.copy()
cat = self.copy() # Don't need deep here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...if you're going to state this, I might briefly explain why.

if self.is_extension:
values = values.copy(deep=True)
else:
values = values.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a deep-copy needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung just pushed addressing most of your comments. for this one: because without deep=True, it isn't a "real" copy. i.e. ExtensionArray.copy(deep=True) behaves the same as np.ndarray.copy().

other3 = data.copy()

other3[0] = other3[1]
assert data[0] == data[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of tests bundled into one function. I would consider breaking this up into at least four different tests, especially since you "reset" every time by doing a copy of some kind of data.

@TomAugspurger
Copy link
Contributor

What was the decision in the issue? I thought we were tending toward deprecating deep in ExtensionArray.copy, but I may be wrong.

@jbrockmendel
Copy link
Member Author

What was the decision in the issue? I thought we were tending toward deprecating deep in ExtensionArray.copy, but I may be wrong.

I don't think a decision has been reached there (though I agree with your assessment of the momentum). Until/unless that change is made, this is the right move. Largely motivated by wanting to clear road-blocks to #27015.

@jreback
Copy link
Contributor

jreback commented Jun 27, 2019

yeah let's not add the deep kwarg on EA .copy(), it doesn't really make sense; we may actually be able to deprecate it entirely on Series/DataFrame as well (separate issue).

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments

@jbrockmendel
Copy link
Member Author

yeah let's not add the deep kwarg on EA .copy()

The issue is that EA already has the deep, kwarg, but Categorical doesn't. I'm pretty sure that this will fix some latent bugs for other EAs, will check and add tests if so.

@jreback jreback added this to the No action milestone Jun 27, 2019
@jreback jreback closed this Jun 27, 2019
@jorisvandenbossche
Copy link
Member

Superceded by #27083

@jbrockmendel jbrockmendel deleted the catcopy branch June 27, 2019 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-Deep copy for Categorical?
5 participants