Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG?: .at not working on object indexes containing some integers #19860

Closed
c-thiel opened this issue Feb 23, 2018 · 6 comments · Fixed by #22436
Closed

BUG?: .at not working on object indexes containing some integers #19860

c-thiel opened this issue Feb 23, 2018 · 6 comments · Fixed by #22436
Labels
Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@c-thiel
Copy link

c-thiel commented Feb 23, 2018

Version 0.22.0

Problem description

Using the .at - Method on an Index which contains Integers as well as str/objects raises an Error. This used to be possible using the .get_value()-Method. As .at is the designated successor (#15269) the same behaviour should be supported.
I also noticed that .get_value is approx. twice as fast as .at. Is there a specific reason to stick with .at? (see again #15269 for a speed comparison)

Code Sample

import pandas as pd
import numpy as np

data = np.random.randn(10, 5)
df = pd.DataFrame(data, columns=['a', 'b', 'c', 1, 2])
df.at[0, 1]

Raises:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/thielc/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1868, in __getitem__
    key = self._convert_key(key)
  File "/home/thielc/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1915, in _convert_key
    raise ValueError("At based indexing on an non-integer "
ValueError: At based indexing on an non-integer index can only have non-integer indexers
@gfyoung gfyoung added the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 23, 2018
@gfyoung
Copy link
Member

gfyoung commented Feb 23, 2018

@c-thiel : Thanks for reporting this! I'm a little unclear as to what's being reported. Is this a regression from a previous version (you said this used to be possible), or is this a general API inconsistency that you're referring to?

That aside, I do agree that the behavior does look strange.

@jorisvandenbossche
Copy link
Member

I don't think it is a regression from a previous version (test with 0.16, already broken there), but it is possible with get_value. However, this is deprecated, and we say people should use .at instead. So then .at should work correctly on this as well.

The problem comes from here:

pandas/pandas/core/indexing.py

Lines 1907 to 1916 in 572476f

for ax, i in zip(self.obj.axes, key):
if ax.is_integer():
if not is_integer(i):
raise ValueError("At based indexing on an integer index "
"can only have integer indexers")
else:
if is_integer(i):
raise ValueError("At based indexing on an non-integer "
"index can only have non-integer "
"indexers")

where we do this check. But I agree the check looks to strict, as a mixed object index can indeed contain integers as well.

Welcome to try to fix this (eg try removing this check, and see if some tests fail due to that).

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Difficulty Intermediate labels Feb 23, 2018
@jreback jreback added this to the Next Major Release milestone Feb 23, 2018
@jreback
Copy link
Contributor

jreback commented Feb 23, 2018

yeah I think prob ok to remove the else check; this ultimately goes thru .loc so indexing verfication can occur there

@jreback
Copy link
Contributor

jreback commented Feb 23, 2018

@c-thiel indexing with mixed dtype indexes is simply going to be slow generally.

@c-thiel
Copy link
Author

c-thiel commented Feb 23, 2018

@jorisvandenbossche Yes, this is what I was reffering to.

@jreback : Regarding Performance, at is still much faster than loc, especially when setting values. The setting-performance for single values is the main reason for me using the set_value and get_value functions. But also the get_value, set_value functions are twice as fast as at :

import pandas as pd
import numpy as np
import time

c = ['a', 'b', 'c', 'd', 'e']
data = np.random.rand(10000, 5)
df = pd.DataFrame(data, columns=c)
rows = np.random.randint(0, 9999, (100000,))
columns = np.random.choice(c, (100000,))

t = time.time()
for row, column in zip(rows, columns):
    a = df.get_value(row, column)    
print(f'get_value: {time.time()-t}')

t = time.time()
for row, column in zip(rows, columns):
    a = df.at[row, column]    
print(f'at: {time.time()-t}')

t = time.time()
for row, column in zip(rows, columns):
    a = df.loc[row, column]    
print(f'loc: {time.time()-t}')

t = time.time()
for row, column in zip(rows, columns):
    df.at[row, column] = 4
print(f'set at: {time.time()-t}')

t = time.time()
for row, column in zip(rows, columns):
    df.loc[row, column] = 5
print(f'set loc: {time.time()-t}')

t = time.time()
for row, column in zip(rows, columns):
    df.set_value(row, column, 4)
print(f'set_value: {time.time()-t}')
get_value: 0.257692813873291
at: 0.52744460105896
loc: 0.7349758148193359
set at: 0.687880277633667
set loc: 11.664336204528809
set_value: 0.3008086681365967

@jreback
Copy link
Contributor

jreback commented Feb 23, 2018

@c-thiel setting individual values in a loop is non-idiomatic. set_value/get_value were deprecated because they didn't properly handle any edge cases nor had any type safetly whatsoever. Correct is much much better then wrong but slightly faster.

mariuspot added a commit to mariuspot/pandas that referenced this issue Aug 21, 2018
.at incorrectly disallowed the use of integer indexes when a mixed index
was used
disabled fallback indexing when a mixed index is used
@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Aug 21, 2018
mariuspot added a commit to mariuspot/pandas that referenced this issue Aug 22, 2018
.at incorrectly disallowed the use of integer indexes when a mixed index
was used
disabled fallback indexing when a mixed index is used
mariuspot added a commit to mariuspot/pandas that referenced this issue Aug 22, 2018
.at incorrectly disallowed the use of integer indexes when a mixed index
was used
disabled fallback indexing when a mixed index is used
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants