Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate returning a DataFrame in SeriesApply.apply_standard #52123

Conversation

topper-123
Copy link
Contributor

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what the output in this case will be after the deprecation. Can you give an example?

pandas/core/apply.py Outdated Show resolved Hide resolved
@topper-123 topper-123 force-pushed the deprecate_Series.apply_using_func_returning_Series branch from 53d4a16 to 7aee834 Compare March 29, 2023 10:18
@topper-123
Copy link
Contributor Author

topper-123 commented Mar 30, 2023

It's not clear to me what the output in this case will be after the deprecation. Can you give an example?

This pattern should not be used with Series.apply, but should the user find other ways to archive the same result. So the result after the change will generally look strange using apply.

IF they use the same method after the change, they will get a Series with Series as its elements:

>>> import pandas as pd
>>> ser = pd.Series(range(3))
>>> func = lambda x: pd.Series({"x": x, "x**2": x**2})
>>> ser.apply(func)  # now
   x  x**2
0  1     1
1  2     4
2  3     9
>>> ser.apply(func)  # after change
0    x       0
x**2    0
dtype: int64
1    x       1
x**2    1
dtype: int64
2    x       2
x**2    4
dtype: int64
dtype: object
>>> ser.apply(func)[0]  # single element after change
x       0
x**2    0
dtype: int64

This pattern will be not useful generally after the change is implemented, so they should use better patterns, e.g.

>>> func = lambda x: pd.DataFrame({"x": x, "x**2": x**2})
>>> ser.pipe(func)  # now  & after this deprecation, it's faster
   x  x**2
0  0     0
1  1     1
2  2     4
>>> pd.DataFrame({"x": ser, "x**2": ser**2})  # or this...
   x  x**2
0  0     0
1  1     1
2  2     4

Just find something better/faster than that old pattern...

@topper-123 topper-123 force-pushed the deprecate_Series.apply_using_func_returning_Series branch 2 times, most recently from f3f2cd9 to fe857a4 Compare April 1, 2023 23:13
@rhshadrach rhshadrach added Deprecate Functionality to remove in pandas Apply Apply, Aggregate, Transform, Map Series Series data structure labels Apr 6, 2023
@rhshadrach rhshadrach added this to the 2.1 milestone Apr 6, 2023
doc/source/whatsnew/v2.1.0.rst Outdated Show resolved Hide resolved
doc/source/user_guide/groupby.rst Outdated Show resolved Hide resolved
@topper-123 topper-123 force-pushed the deprecate_Series.apply_using_func_returning_Series branch from 51b839d to d6cb073 Compare April 6, 2023 07:22
@topper-123
Copy link
Contributor Author

I've addressed the comments by @rhshadrach.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach merged commit fe415f5 into pandas-dev:main Apr 7, 2023
@rhshadrach
Copy link
Member

Thanks @topper-123

@topper-123 topper-123 deleted the deprecate_Series.apply_using_func_returning_Series branch April 7, 2023 06:37
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 10, 2023

@topper-123 This tripped up a test we do on nightly builds in pandas-stubs, which we can fix, but I think the error message could be improved. If the recommendation is to use Series.pipe() instead to get the same result, then I think that should go in the whatsnew document, i.e., instead of saying "This pattern was very slow and it's recommended to use alternative methods to archive the same goal", say "This pattern was very slow and it's recommended to useSeries.pipe() to achieve the same goal". In addition, including that text about Series.pipe() should be in the error message (assuming that's the preferred workaround).

Also, the error message doesn't look right due to spacing:

>>> import pandas as pd
>>> pd.__version__
'2.1.0.dev0+463.ge9e034badb'
>>> s = pd.Series([-10, 2, 2, 3.4, 10, 10])
>>> def makeseries(x: float) -> pd.Series:
...     return pd.Series([x, 2 * x])
...
>>> s.apply(makeseries)
<stdin>:1: FutureWarning: Returning a DataFrame from Series.apply when the supplied functionreturns a Series is deprecated and will be removed in a future version.
      0     1
0 -10.0 -20.0
1   2.0   4.0
2   2.0   4.0
3   3.4   6.8
4  10.0  20.0
5  10.0  20.0

Note the "functionreturns" misses a space.

Let me know if I should create a separate issue for this.

@topper-123
Copy link
Contributor Author

Hi @Dr-Irv.

The ability to have series put into a DataFrame can be misused in many different manners. That pipe example was a way to be able to do that particular operation vectorized, i.e. fast. and it can't be generalized.

A general pattern you could exchange the deprecated patterns with would be to do it like ser.pipe(lambda x: pd.DataFrame(x.map(value_func_that_returns_series).to_list())) though that would be dog slow also. To me that's a quite ugly expression and I guess we could inform them about it in the whatsnew section, with a clear warning about performance and that users should find other patterns to do the operation. IMO that code pattern should not be in the error message, it could encourage slow code.,

You're right about the error message, there's a missing space there.

Tell me what you think about the above.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 10, 2023

Tell me what you think about the above.

I think you'll need to expand on this topic in the whatsnew document. Especially since you removed this from the userguide:

``apply`` on a Series can operate on a returned value from the applied function
that is itself a series, and possibly upcast the result to a DataFrame:

.. ipython:: python
    def f(x):
        return pd.Series([x, x ** 2], index=["x", "x^2"])
    s = pd.Series(np.random.rand(5))
    s
    s.apply(f)

This is a useful construct - you have a Series, and you want to get a DataFrame that has multiple columns that are functions of the Series. Now maybe the recommended way of changing the above code would be:

pd.DataFrame(s, columns=['s']).assign(x=lambda df: df.s, x2=lambda df: df.s ** 2).drop(columns=["s"])

There are clearly multiple alternatives, but with this deprecation, I think you might want to demonstrate those alternatives in the whatsnew document, since we had it in the user guide before as a pattern that people most likely copied.

warnings.warn(
"Returning a DataFrame from Series.apply when the supplied function"
"returns a Series is deprecated and will be removed in a future "
"version.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message should point users to a resource that is explaining what to do now

@phofl
Copy link
Member

phofl commented Apr 11, 2023

Agreed with @Dr-Irv

This needs more explanation somewhere about what to do now

@phofl
Copy link
Member

phofl commented Apr 11, 2023

There isa DeprecationWarning missing:

df = pd.DataFrame({"x": [1, 2, 3, 4], "y": [10, 20, 30, 40]})


def return_df(x):
    # will create new DataFrame which columns is ['sum', 'mean']
    return pd.Series([x.sum(), x.mean()], index=["sum", "mean"])

df.apply(return_df, axis=1)

Looks like axis takes another path?

phofl added a commit to phofl/pandas that referenced this pull request Sep 18, 2023
mroeschke pushed a commit that referenced this pull request Sep 19, 2023
…andard (#55189)

* Revert "DEPR: Deprecate returning a DataFrame in SeriesApply.apply_standard (#52123)"

This reverts commit fe415f5

* Fix tests

* Add whatsnew

* Add whatsnew

* Add test
hedeershowk pushed a commit to hedeershowk/pandas that referenced this pull request Sep 20, 2023
…andard (pandas-dev#55189)

* Revert "DEPR: Deprecate returning a DataFrame in SeriesApply.apply_standard (pandas-dev#52123)"

This reverts commit fe415f5

* Fix tests

* Add whatsnew

* Add whatsnew

* Add test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Deprecate Functionality to remove in pandas Series Series data structure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR: Deprecate returning a DataFrame in Series.apply
4 participants