Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: define __copy__ NDFrame #8571

Closed
jreback opened this issue Oct 17, 2014 · 14 comments
Closed

API: define __copy__ NDFrame #8571

jreback opened this issue Oct 17, 2014 · 14 comments

Comments

@jreback
Copy link
Contributor

jreback commented Oct 17, 2014

from SO

this allows copy.copy(df) to work properly

by definition 'always' deep

__copy__ = copy
__deepcopy__ = copy
@immerrr
Copy link
Contributor

immerrr commented Oct 17, 2014

The fact that copy doesn't work as expected without providing a __copy__ method means that pickling is broken in the same way as copy uses it as a fallback. Maybe pickle should be fixed too?

Also, I don't really get why

by definition 'always' deep

@jreback
Copy link
Contributor Author

jreback commented Oct 17, 2014

pickling isn't broken
a class level attribute is not copied (nor should it be)

and always deep because that's the default (copy=True)

@immerrr
Copy link
Contributor

immerrr commented Oct 17, 2014

pickling isn't broken
a class level attribute is not copied (nor should it be)

It's not about copying the class-level attribute, it's about preserving object-level attribute as mandated by a modified class-level attribute. If there's some public-API-level contract about that somewhere (I admit I haven't seen any yet) that attributes mentioned in _metadata must be preserved for instances of Series and pickle doesn't obey that contract, then it sure seems broken to me.

As for __copy__ performing a deep copy, this seems a direct violation of copy.copy purpose as stated by its doc which says "Return a shallow copy of x". But, I guess, there's a precedent set by numpy in this direction (e.g. here), so that's probably ok.

@jreback
Copy link
Contributor Author

jreback commented Oct 17, 2014

@immerrr modifying _metadata and having it preserve is not documented atm nor really exposed to the user (its basically internal). That said I don't see any harm with doing so. This is a really simple issue of __copy__ just not doing what is needed (its close, but not perfect, just need to simply define it -end of issue).

it doesn't matter what copy.copy should perform a shallow copy. pandas objects don't have this as a guarantee anywhere (and it should not be done, copies are ALWAYS deep, its the indexes tht can be shallow). I am not sure I see a problem here either. Its actually what we do it ultimately take a view of the data when its not deep so that could be a copy (but generally is not).

@immerrr
Copy link
Contributor

immerrr commented Oct 18, 2014

Ok, no contract — no worries.

copies are ALWAYS deep

Isn't there a x.copy(deep=False) that shallow copies x?

@jreback jreback modified the milestones: 0.16.0, 0.15.2 Dec 4, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
jreback added a commit to jreback/pandas that referenced this issue Oct 11, 2015
- closes pandas-dev#10556, add policy argument to constructors
- closes pandas-dev#9216, all passing of dict with view directly to the API
- closes pandas-dev#5902
- closes pandas-dev#8571 by defining __copy__/__deepcopy__
@gliptak
Copy link
Contributor

gliptak commented Apr 12, 2016

@jreback Was this closed by jreback@d82196c (from above)? Thanks

@jreback
Copy link
Contributor Author

jreback commented Apr 12, 2016

no that's something else

@gliptak
Copy link
Contributor

gliptak commented Apr 12, 2016

"closes #8571 by defining copy/deepcopy"?

@jreback
Copy link
Contributor Author

jreback commented Apr 12, 2016

it was not merged

@gliptak
Copy link
Contributor

gliptak commented Apr 12, 2016

I see. So would adding __copy__/__deepcopy__ as in http://stackoverflow.com/a/15774013/304690 be appropriate?

@jreback
Copy link
Contributor Author

jreback commented Apr 12, 2016

I think you can just define them as I did at the top

@gliptak
Copy link
Contributor

gliptak commented Apr 12, 2016

import pandas as pd
import numpy as np
import copy
pd.Series._metadata.append('units')
df = pd.DataFrame(data=np.random.randn(5,2), columns=['A', 'B'])
df['A'].units = 's'
df['B'].units = 'm'
s = copy.copy(df['A'])
print(s.units)

prints

s

on

INSTALLED VERSIONS
------------------
commit: 504ad4641e5b3a87b655da5dd31840de12495e97
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-16-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0+105.g504ad46.dirty
nose: 1.3.7
pip: 8.1.1
setuptools: 20.2.2
Cython: None
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.1.2
sphinx: None
patsy: None
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

akleeman added a commit to akleeman/pandas that referenced this issue Jun 29, 2016
jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this issue Jan 14, 2018
- closes pandas-dev#10556, add policy argument to constructors
- closes pandas-dev#9216, all passing of dict with view directly to the API
- closes pandas-dev#5902
- closes pandas-dev#8571 by defining __copy__/__deepcopy__
@jorisvandenbossche
Copy link
Member

@jreback I think this was closed by #15444?

@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.20.0 Jan 15, 2018
@jreback
Copy link
Contributor Author

jreback commented Jan 15, 2018

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants