Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate the convert_dtype param in Series.Apply #52257

Merged

Conversation

topper-123
Copy link
Contributor

Depreates the convert_dtype parameter in Series.apply. Also does some minor clean-ups in lib.pyx.

Progress towards #52140.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was signed with the committer’s verified signature.
nyodas Bob
@mroeschke mroeschke added Apply Apply, Aggregate, Transform, Map Deprecate Functionality to remove in pandas labels Mar 28, 2023
pandas/core/series.py Outdated Show resolved Hide resolved
else:
warnings.warn(
"the convert_dtype parameter is deprecated and will be removed in a "
"future version.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to mention the alternative here if convert_dtype=False was specified

@topper-123
Copy link
Contributor Author

I’ve updated.

@mroeschke mroeschke added this to the 2.1 milestone Mar 30, 2023
@mroeschke mroeschke merged commit 825b5c3 into pandas-dev:main Mar 30, 2023
@mroeschke
Copy link
Member

Thanks @topper-123

@topper-123 topper-123 deleted the deprecate_Series.apply_convert_dtype branch March 30, 2023 17:08
topper-123 added a commit to topper-123/pandas that referenced this pull request Apr 1, 2023
…2257)

* DEPR: Deprecate param convert_dtype in Series.Apply

* fix StataReader

* fix issue

* Update pandas/core/series.py

Co-authored-by: Matthew Roeschke <[email protected]>

* explain more in warning

---------

Co-authored-by: Matthew Roeschke <[email protected]>
topper-123 added a commit to topper-123/pandas that referenced this pull request Apr 1, 2023
…2257)

* DEPR: Deprecate param convert_dtype in Series.Apply

* fix StataReader

* fix issue

* Update pandas/core/series.py

Co-authored-by: Matthew Roeschke <[email protected]>

* explain more in warning

---------

Co-authored-by: Matthew Roeschke <[email protected]>
@jorisvandenbossche
Copy link
Member

@topper-123 what's the rationale of this deprecation? I suppose just to get rid of a probably-not-used-much keyword, and it's something that doesn't exist anyway in map when we prefer users to use that method instead of apply for element-wise UDFs.

But, if that's the case, I think the message about the alternative is wrong? Unless you plan to actually make the future behaviour to be convert_dtype=False? (but I assume not, because that would be a bigger breaking change that would need to be discussed)

Currently the text says:

Do ser.astype(object).apply() instead if you want convert_dtype=False.

But also right now, we do infer the dtype of the result, even if the starting dtype was object. So with our current behaviour, first casting to object dtype before calling apply doesn't make any difference for the result dtype inference.
So is your idea to change that in the future? Or was that just an oversight in the message and we can update it (although that would mean there is no easy alternative to preserve the behaviour of convert_dtype=False)

@topper-123
Copy link
Contributor Author

Hi @jorisvandenbossche

Thinking about it now again, removing convert_dtype in Series.apply made things lot simpler in the code base. It was very inconsistently implemented, so typically for ExtensionArrays, convert_dtype wasn't implemented. I value a consistent interface, so IMO either implement convert_dtype in Extensionarray.map or remove it. I didn't think convert_dtype adds much value, so opted to deprecate it.

My original idea was actually to move convert_dtype to Series.map, but had a discussion with @rhshadrach about it see comment in #52140 and opted to just deprecate it in Series.apply and not move it to Series.map.

But also right now, we do infer the dtype of the result, even if the starting dtype was object. So with our current behaviour, first casting to object dtype before calling apply doesn't make any difference for the result dtype inference.

Hm, yeah, I see this point, I didn't consider object dtype, when I wrote the error message, I must admit, so the question is how to deal with it. I think there are two options:

1: Implement convert_dtype parameter in Series|DataFrame.map and guide people to use Series|DataFrame.map. That means it must also either be implemented in ExtensionArray.map or a note should be added in the .map doc string that convert_dtype has not effect on ExtensionArrays.
2: Make a clearer error message

I'm not super sure what direction is best. Generally, operating element-wise makes it very difficult to guarantee a certain return dtype because an array has to be constructed and we can't know what dtype is should have. E.g. for something like ser.map(lambda x: ...) we can't really know what the return dtype will be. But I also see the value in a "best effort" like we do currently.

I guess the choice depends on how much people value the option to do convert_dtype=False. For me, I put a low value on it (I wouldn't use it) and am ok with just doing a better deprecation message, so I think option 2 is better.

@rhshadrach
Copy link
Member

I didn't not realize the work around didn't work. I'm in favor of reverting this deprecation and added convert_dtype to map. It appears that all we have to do is expose the convert argument of _map_values to accomplish this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants