[BUG] loc
-based indexing with slice ranges inconsistent with pandas
#12833
Labels
bug
Something isn't working
improvement
Improvement / enhancement to an existing function
Python
Affects Python cuDF API.
Milestone
Describe the bug
When indexing a frame with
.loc
one can provide a slice range where the endpoints of the slice are not contained in the index. If the index is not sorted then both pandas and cudf agree that you receive an error.Probably we should fix the distinction in the exception that is raised.
However, when the index is sorted, behaviour starts to differ:
When the index is monotone increasing (repeated values are allowed), and the slice has a positive step, then things work
If the slice has negative stride, then cudf produces the wrong answer:
If the index is monotone decreasing, and the slice has positive step, then the behaviour matches if both slice endpoints are in the index, and the behaviour differs if at least one of the slice end points is not in the index, in these circumstances, pandas always returns an empty dataframe, and cudf always produces a
ValueError
.If the slice has negative step, then the behaviour also differs, but differently. If both slice endpoints are in the index, then cudf returns an empty dataframe, whereas pandas returns the (reordered) dataframe sliced by the endpoints
If at least one of the slice endpoints is not in the index, then cudf produces a
ValueError
and pandas behaves as for the monotone increasing case.The above issues apply, mutatis mutandis to almost all index types in almost all cases (although I have not exhaustively checked). Amusingly the monotone increasing case with positive step slice with endpoint not in the index works for
FloatXXXIndex
.Expected behavior
We should be consistent, both internally, and probably matching pandas where possible (raising errors otherwise).
The text was updated successfully, but these errors were encountered: