Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series constructor not respecting CoW when called with BlockManager #52017

Merged
merged 5 commits into from
Mar 29, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,9 @@ def __init__(
copy: bool = False,
fastpath: bool = False,
) -> None:
if isinstance(data, SingleBlockManager) and using_copy_on_write() and not copy:
data = data.copy(deep=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be done inside the isinstance(data, (SingleBlockManager, SingleArrayManager)) block below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a bunch of branches where we handle BlockManagers, so we could yes but we'd have to duplicate the copy call a bunch of times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But shouldn't this only be reached by internal calls where we should be specifying copy explicitly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically yes, practically I don't know if users use this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in practice e.g. pyarrow does but i think we should get them to stop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be ok with adding this for 2.0 and deprecating for 2.1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a bunch of branches where we handle BlockManagers, so we could yes but we'd have to duplicate the copy call a bunch of times.

I would duplicate it a bit, or at least for this initial block, and then maybe once after that.

The first if case here is meant to be our "fastpath" creating a series from just a manager, and to keep this as fast as possible, I would move this check once inside that block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it down


if (
isinstance(data, (SingleBlockManager, SingleArrayManager))
and index is None
Expand Down
28 changes: 28 additions & 0 deletions pandas/tests/copy_view/test_constructors.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import numpy as np
import pytest

import pandas as pd
from pandas import (
DataFrame,
Series,
Expand Down Expand Up @@ -82,6 +83,33 @@ def test_series_from_series_with_reindex(using_copy_on_write):
assert not result._mgr.blocks[0].refs.has_reference()


@pytest.mark.parametrize("fastpath", [False, True])
@pytest.mark.parametrize("dtype", [None, "int64"])
@pytest.mark.parametrize("idx", [None, pd.RangeIndex(start=0, stop=3, step=1)])
def test_series_from_block_manager(using_copy_on_write, idx, dtype, fastpath):
ser = Series([1, 2, 3], dtype="int64")
ser_orig = ser.copy()
ser2 = Series(ser._mgr, dtype=dtype, fastpath=fastpath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to pass index=idx here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, thx

assert np.shares_memory(get_array(ser), get_array(ser2))
if using_copy_on_write:
assert not ser2._mgr._has_no_reference(0)

ser2.iloc[0] = 100
if using_copy_on_write:
tm.assert_series_equal(ser, ser_orig)
else:
expected = Series([100, 2, 3])
tm.assert_series_equal(ser, expected)


def test_series_from_block_manager_different_dtype(using_copy_on_write):
ser = Series([1, 2, 3], dtype="int64")
ser2 = Series(ser._mgr, dtype="int32")
assert not np.shares_memory(get_array(ser), get_array(ser2))
if using_copy_on_write:
assert ser2._mgr._has_no_reference(0)


@pytest.mark.parametrize("func", [lambda x: x, lambda x: x._mgr])
@pytest.mark.parametrize("columns", [None, ["a"]])
def test_dataframe_constructor_mgr_or_df(using_copy_on_write, columns, func):
Expand Down