Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series constructor not respecting CoW when called with BlockManager #52017

Merged
merged 5 commits into from
Mar 29, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Mar 16, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@phofl phofl added this to the 2.0 milestone Mar 16, 2023
@@ -379,6 +379,9 @@ def __init__(
copy: bool = False,
fastpath: bool = False,
) -> None:
if isinstance(data, SingleBlockManager) and using_copy_on_write() and not copy:
data = data.copy(deep=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be done inside the isinstance(data, (SingleBlockManager, SingleArrayManager)) block below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a bunch of branches where we handle BlockManagers, so we could yes but we'd have to duplicate the copy call a bunch of times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But shouldn't this only be reached by internal calls where we should be specifying copy explicitly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically yes, practically I don't know if users use this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in practice e.g. pyarrow does but i think we should get them to stop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be ok with adding this for 2.0 and deprecating for 2.1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a bunch of branches where we handle BlockManagers, so we could yes but we'd have to duplicate the copy call a bunch of times.

I would duplicate it a bit, or at least for this initial block, and then maybe once after that.

The first if case here is meant to be our "fastpath" creating a series from just a manager, and to keep this as fast as possible, I would move this check once inside that block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it down

@jbrockmendel
Copy link
Member

BTW we should deprecate accepting managers in the constructor and wean 3rd party libraries off of it

def test_series_from_block_manager(using_copy_on_write, idx, dtype, fastpath):
ser = Series([1, 2, 3], dtype="int64")
ser_orig = ser.copy()
ser2 = Series(ser._mgr, dtype=dtype, fastpath=fastpath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to pass index=idx here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, thx

@jorisvandenbossche
Copy link
Member

BTW we should deprecate accepting managers in the constructor and wean 3rd party libraries off of it

If we want to pursue that, I think it would be best to start with proposing and implementing an alternative option to use, since we still need some way to do this for both advanced users (like pyarrow) and subclasses (they need to be able to customize this constructor from a manager, xref #51772)

(I personally don't have a strong opinion about this. I think it would certainly make sense to have a separate constructor if we were designing it, just not fully sure it is worth changing now)

@phofl
Copy link
Member Author

phofl commented Mar 29, 2023

Addressed all comments, will merge to get in.

@phofl phofl merged commit 1f9e8e7 into pandas-dev:main Mar 29, 2023
@phofl phofl deleted the cow_series_block_manager branch March 29, 2023 14:07
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Mar 29, 2023
phofl added a commit that referenced this pull request Mar 29, 2023
…cting CoW when called with BlockManager) (#52275)

Backport PR #52017: BUG: Series constructor not respecting CoW when called with BlockManager

Co-authored-by: Patrick Hoefler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants