-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ArrayManager] TST: run (+fix/skip) pandas/tests/indexing tests #40325
[ArrayManager] TST: run (+fix/skip) pandas/tests/indexing tests #40325
Conversation
So one of the main causes for differences in behaviour between BlockManager and ArrayManager, is that ArrayManager always takes the "split" path in the indexing code. And, the split path has different behaviour for several cases compared to the single block path. Which is the case for both BlockManager and ArrayManager, but because many of the tests are using simple dataframes, the default tests with BlockManager don't catch this. For example, setting values with I still need to open issues for those several cases that fail for BlockManager as well. |
FWIW ive got a branch nearing readiness that makes setitem_with_indexer take the split_path iff it is 2D, which will hopefully make keeping these behaviors in sync somewhat easier. |
pandas/tests/indexing/test_loc.py
Outdated
expected = DataFrame({"A": ser}) | ||
tm.assert_frame_equal(df, expected) | ||
|
||
# with mixed dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate test
@pytest.mark.parametrize("box", [array, Series]) | ||
def test_iloc_setitem_ea_inplace(self, frame_or_series, box): | ||
def test_iloc_setitem_ea_inplace(self, frame_or_series, box, using_array_manager): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why both the skip and the fixture? id expect one or the other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I already rewrote the test so that it could pass for ArrayManager (accessing .values for a DataFrame will never work to get a view of the data with AM, so that needs to be rewritten), it's only still failing for other reasons.
pandas/tests/indexing/test_loc.py
Outdated
df.loc[[4, 3, 5], "A"] = np.array([1, 2, 3], dtype="int64") | ||
# TODO with "split" path we still actually overwrite the column | ||
# and therefore don't take the order of the indexer into account | ||
ser = Series([1, 2, 3], index=[3, 5, 4], dtype="int64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is weird right? why wouldnt we expect ser = Series([2, 3, 1], index=[3, 5, 4])
?
(actual question: is my intuition wrong, or is this a bug?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your intuition is correct, this is clearly a bug (that wasn't tested because the original test (still above) was using a homogeneous dataframe). That's what the TODO meant to explain.
But I now opened an issue for it (#40480, it's actually a regression on master, apparently), will update the comment with a link to that issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And made it an xfail instead of testing the wrong result
one question, LGTM pending green |
xref #39146
Extracting part of the test changes of #39578. To keep the size a bit more limited, I am splitting the PRs, so this PR is doing
pandas/tests/indexing
. Thepandas/tests/frame/indexing
are handled in #40323