Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df.loc[:, 'col'] returning a view, but df.loc[df.index, 'col'] returning a copy #15631

Open
STguerin opened this issue Mar 9, 2017 · 5 comments · Fixed by #34996
Open

df.loc[:, 'col'] returning a view, but df.loc[df.index, 'col'] returning a copy #15631

STguerin opened this issue Mar 9, 2017 · 5 comments · Fixed by #34996
Labels
Bug Copy / view semantics Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action

Comments

@STguerin
Copy link

STguerin commented Mar 9, 2017

Code Sample, a copy-pastable example if possible

x = pd.DataFrame(zip(range(4), range(4)), columns=['a', 'b'])
print(x)
   a  b
0  0  0
1  1  1
2  2  2
3  3  3

q = x.loc[:, 'a']
q += 2
print(x)
   a  b
0  2  0
1  3  1
2  4  2
3  5  3

x = pd.DataFrame(zip(range(4), range(4)), columns=['a', 'b'])
print(x)
   a  b
0  0  0
1  1  1
2  2  2
3  3  3

q = x.loc[x.index, 'a']
q += 2
print(x)
   a  b
0  0  0
1  1  1
2  2  2
3  3  3

Problem description

[df.loc[:, 'col'] returning a view, but df.loc[df.index, 'col'] returning a copy, intended? how can I make sure it is returning a copy?]

Expected Output

I thought .loc[] was always returning a copy

pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None pandas: 0.18.1 nose: 1.3.7 pip: 8.1.2 setuptools: 27.2.0 Cython: 0.24.1 numpy: 1.11.1 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.4.6 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: 1.1.0 tables: 3.2.2 numexpr: 2.6.1 matplotlib: 1.5.3 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.3 lxml: 3.6.4 bs4: 4.5.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.42.0 pandas_datareader: None
@jreback
Copy link
Contributor

jreback commented Mar 9, 2017

this is as expected (see #6149), but see below.

df.loc[:, columns] is treated as df[columns] which may return a view.

df.loc[indexer, columns] also may return return a view, but almost always does not in practice.

yes if indexer is df.index we could treat this as the former situation (IOW the indexer is exactly equals to the index of the frame). (its just a indexer.equals(df.index) type of comparison.

So i'll mark this as a compat issue if you'd do a pull-request would be great. This is actually a very small change, see https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexing.py#L503

@jreback jreback added Bug Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate Indexing Related to indexing on series/frames, not to indexes themselves labels Mar 9, 2017
@jreback jreback added this to the Next Major Release milestone Mar 9, 2017
@STguerin
Copy link
Author

STguerin commented Mar 9, 2017

Thanks Jeff appreciate your insight on this.

@mroeschke
Copy link
Member

This looks fixed in master. Could use a test

In [29]: pd.__version__
Out[29]: '1.1.0.dev0+1216.gd4d58f960'

In [30]: x = pd.DataFrame(zip(range(4), range(4)), columns=['a', 'b'])
    ...: print(x)
   a  b
0  0  0
1  1  1
2  2  2
3  3  3

In [31]: q = x.loc[:, 'a']
    ...: q += 2
    ...: print(x)
   a  b
0  0  0
1  1  1
2  2  2
3  3  3

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Compat pandas objects compatability with Numpy or Python functions Indexing Related to indexing on series/frames, not to indexes themselves labels Apr 10, 2020
@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Jun 25, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.1 Jun 25, 2020
@simonjayhawkins
Copy link
Member

simonjayhawkins commented Sep 25, 2020

This looks fixed in master. Could use a test

The fix may have been #30501, which may need to be reverted. The tests added in #34996 may need to be removed and this issue reopened.

@jorisvandenbossche jorisvandenbossche removed the Needs Tests Unit test(s) needed to prevent regressions label Oct 29, 2020
@jorisvandenbossche jorisvandenbossche modified the milestones: 1.1, Contributions Welcome Oct 29, 2020
@jorisvandenbossche
Copy link
Member

As noted in the PRs for 1.1.4 that reverted the fix for this, we should actually investigate what the expected behaviour is, because it's not clear that the test that was added in #34996 is actually correct (cc @jbrockmendel)

@mroeschke mroeschke added Bug Needs Discussion Requires discussion from core team before further action labels May 8, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Copy / view semantics Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action
Projects
None yet
6 participants