Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ArrayManager] TST: run (+fix/skip) pandas/tests/indexing tests #40325

Merged

Conversation

jorisvandenbossche
Copy link
Member

xref #39146

Extracting part of the test changes of #39578. To keep the size a bit more limited, I am splitting the PRs, so this PR is doing pandas/tests/indexing. The pandas/tests/frame/indexing are handled in #40323

@jorisvandenbossche jorisvandenbossche added Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation labels Mar 9, 2021
@jorisvandenbossche
Copy link
Member Author

So one of the main causes for differences in behaviour between BlockManager and ArrayManager, is that ArrayManager always takes the "split" path in the indexing code. And, the split path has different behaviour for several cases compared to the single block path. Which is the case for both BlockManager and ArrayManager, but because many of the tests are using simple dataframes, the default tests with BlockManager don't catch this.

For example, setting values with .loc[:, col] still overwrites the column instead of updating inplace when possible.

I still need to open issues for those several cases that fail for BlockManager as well.

@jbrockmendel
Copy link
Member

So one of the main causes for differences in behaviour between BlockManager and ArrayManager, is that ArrayManager always takes the "split" path in the indexing code

FWIW ive got a branch nearing readiness that makes setitem_with_indexer take the split_path iff it is 2D, which will hopefully make keeping these behaviors in sync somewhat easier.

expected = DataFrame({"A": ser})
tm.assert_frame_equal(df, expected)

# with mixed dataframe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate test

@pytest.mark.parametrize("box", [array, Series])
def test_iloc_setitem_ea_inplace(self, frame_or_series, box):
def test_iloc_setitem_ea_inplace(self, frame_or_series, box, using_array_manager):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why both the skip and the fixture? id expect one or the other

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I already rewrote the test so that it could pass for ArrayManager (accessing .values for a DataFrame will never work to get a view of the data with AM, so that needs to be rewritten), it's only still failing for other reasons.

df.loc[[4, 3, 5], "A"] = np.array([1, 2, 3], dtype="int64")
# TODO with "split" path we still actually overwrite the column
# and therefore don't take the order of the indexer into account
ser = Series([1, 2, 3], index=[3, 5, 4], dtype="int64")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is weird right? why wouldnt we expect ser = Series([2, 3, 1], index=[3, 5, 4])?

(actual question: is my intuition wrong, or is this a bug?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your intuition is correct, this is clearly a bug (that wasn't tested because the original test (still above) was using a homogeneous dataframe). That's what the TODO meant to explain.
But I now opened an issue for it (#40480, it's actually a regression on master, apparently), will update the comment with a link to that issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And made it an xfail instead of testing the wrong result

@jbrockmendel
Copy link
Member

one question, LGTM pending green

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants