-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG?: .at not working on object indexes containing some integers #19860
Comments
@c-thiel : Thanks for reporting this! I'm a little unclear as to what's being reported. Is this a regression from a previous version (you said this used to be possible), or is this a general API inconsistency that you're referring to? That aside, I do agree that the behavior does look strange. |
I don't think it is a regression from a previous version (test with 0.16, already broken there), but it is possible with The problem comes from here: pandas/pandas/core/indexing.py Lines 1907 to 1916 in 572476f
where we do this check. But I agree the check looks to strict, as a mixed object index can indeed contain integers as well. Welcome to try to fix this (eg try removing this check, and see if some tests fail due to that). |
yeah I think prob ok to remove the else check; this ultimately goes thru |
@c-thiel indexing with mixed dtype indexes is simply going to be slow generally. |
@jorisvandenbossche Yes, this is what I was reffering to. @jreback : Regarding Performance, import pandas as pd
import numpy as np
import time
c = ['a', 'b', 'c', 'd', 'e']
data = np.random.rand(10000, 5)
df = pd.DataFrame(data, columns=c)
rows = np.random.randint(0, 9999, (100000,))
columns = np.random.choice(c, (100000,))
t = time.time()
for row, column in zip(rows, columns):
a = df.get_value(row, column)
print(f'get_value: {time.time()-t}')
t = time.time()
for row, column in zip(rows, columns):
a = df.at[row, column]
print(f'at: {time.time()-t}')
t = time.time()
for row, column in zip(rows, columns):
a = df.loc[row, column]
print(f'loc: {time.time()-t}')
t = time.time()
for row, column in zip(rows, columns):
df.at[row, column] = 4
print(f'set at: {time.time()-t}')
t = time.time()
for row, column in zip(rows, columns):
df.loc[row, column] = 5
print(f'set loc: {time.time()-t}')
t = time.time()
for row, column in zip(rows, columns):
df.set_value(row, column, 4)
print(f'set_value: {time.time()-t}')
|
@c-thiel setting individual values in a loop is non-idiomatic. set_value/get_value were deprecated because they didn't properly handle any edge cases nor had any type safetly whatsoever. Correct is much much better then wrong but slightly faster. |
.at incorrectly disallowed the use of integer indexes when a mixed index was used disabled fallback indexing when a mixed index is used
.at incorrectly disallowed the use of integer indexes when a mixed index was used disabled fallback indexing when a mixed index is used
.at incorrectly disallowed the use of integer indexes when a mixed index was used disabled fallback indexing when a mixed index is used
Version 0.22.0
Problem description
Using the .at - Method on an Index which contains Integers as well as str/objects raises an Error. This used to be possible using the
.get_value()
-Method. As.at
is the designated successor (#15269) the same behaviour should be supported.I also noticed that
.get_value
is approx. twice as fast as.at
. Is there a specific reason to stick with.at
? (see again #15269 for a speed comparison)Code Sample
Raises:
The text was updated successfully, but these errors were encountered: