You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Pandas interprets the sub-expression idx of an expression sr[idx] as an absolute position in the series sr when idx's dtype is different from that of sr's index. However, cuDF always treats idx as a value to look up in sr's index, which can lead to different behaviors when indices have non-integral dtypes:
In Pandas, the indexing takes both an integer and a string as the index:
>>> import pandas as pd
>>> x = pd.Series([1,2,3], index=pd.Index(["a", "b", "c"]))
>>> x["b"]
2
>>> x[1]
2
For the same example, cuDF raises a KeyError on the second access:
>>> import cudf
>>> x = cudf.Series([1,2,3], index=cudf.Index(["a", "b", "c"]))
>>> x["b"]
2
>>> x[1]
Traceback (most recent call last):
File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/indexing.py", line 138, in _loc_to_iloc
arg, closest=False
File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/column/string.py", line 4938, in find_first_value
return self._find_first_and_last(value)[0]
File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/column/string.py", line 4933, in _find_first_and_last
first = column.as_column(found_indices).find_first_value(1)
File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/column/numerical.py", line 477, in find_first_value
raise ValueError("value not found")
ValueError: value not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/indexing.py", line 118, in __getitem__
arg = self._loc_to_iloc(arg)
File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/indexing.py", line 142, in _loc_to_iloc
raise KeyError("label scalar is out of bound")
KeyError: 'label scalar is out of bound'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/series.py", line 921, in __getitem__
return self.loc[arg]
File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/indexing.py", line 120, in __getitem__
raise KeyError(arg)
KeyError: 1
>>>
The text was updated successfully, but these errors were encountered:
…` is non-numeric dtype (#7897)
Pandas interprets `idx` in the expression `sr[idx]` as an absolute position in the series `sr` when `idx`'s `dtype` is different from that of `sr`'s `Index`.
In Pandas, the indexing takes both an integer and a string as the index:
```
>>> import pandas as pd
>>> x = pd.Series([1,2,3], index=pd.Index(["a", "b", "c"]))
>>> x["b"]
2
>>> x[1]
2
```
Whereas cuDF treats `idx `as a value to look up in `sr`'s Index, which can lead to different behaviors when indices have non-integral dtypes:
```
>>> import cudf
>>> x = cudf.Series([1,2,3], index=cudf.Index(["a", "b", "c"]))
>>> x["b"]
2
>>> x[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/series.py", line 921, in __getitem__
return self.loc[arg]
File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/indexing.py", line 120, in __getitem__
raise KeyError(arg)
KeyError: 1
```
This PR fixes the mismatch behavior in cuDF by deferring to `iloc` when a Series has a non-numerical Index and the indexer `idx `is an integer-like value ` : int, cudf Scalar, numpy int [np.int8, np.uint32, int64 `,,,]
Fixes: #7622
Replaces: #7775
Authors:
- Sheilah Kirui (https://github.com/skirui-source)
Approvers:
- Michael Wang (https://github.com/isVoid)
- Keith Kraus (https://github.com/kkraus14)
URL: #7897
Describe the bug
Pandas interprets the sub-expression
idx
of an expressionsr[idx]
as an absolute position in the seriessr
whenidx
's dtype is different from that ofsr
's index. However, cuDF always treatsidx
as a value to look up insr
's index, which can lead to different behaviors when indices have non-integral dtypes:cudf/python/cudf/cudf/core/series.py
Lines 1183 to 1187 in 34cccfe
Steps/Code to reproduce bug
In Pandas, the indexing takes both an integer and a string as the index:
For the same example, cuDF raises a
KeyError
on the second access:The text was updated successfully, but these errors were encountered: