-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API/BUG: .apply will correctly infer output shape when axis=1 #18577
Conversation
Codecov Report
@@ Coverage Diff @@
## master #18577 +/- ##
==========================================
- Coverage 91.45% 91.43% -0.03%
==========================================
Files 157 157
Lines 51378 51392 +14
==========================================
+ Hits 46987 46988 +1
- Misses 4391 4404 +13
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18577 +/- ##
==========================================
- Coverage 91.6% 91.6% -0.01%
==========================================
Files 150 150
Lines 48750 48793 +43
==========================================
+ Hits 44656 44695 +39
- Misses 4094 4098 +4
Continue to review full report at Codecov.
|
c369687
to
f6f0371
Compare
@TomAugspurger @jorisvandenbossche if you have a chance |
Unfortunately I'm guessing this will break someone's code in an subtle way, but current behavior is obviously bad, so seems like a net positive. I've never used it so not sure I fully understand the itention, but this does seem to break the In [23]: pd.__version__
Out[23]: '0.22.0.dev0+310.gf6f0371'
In [24]: df.apply(lambda x: (1, 2, 3), axis=1, reduce=False)
Out[24]:
A B C
0 1 2 3
1 1 2 3
2 1 2 3
3 1 2 3
4 1 2 3
5 1 2 3
In [25]: df.apply(lambda x: (1, 2, 3), axis=1, reduce=True)
Out[25]:
A B C
0 1 2 3
1 1 2 3
2 1 2 3
3 1 2 3
4 1 2 3
5 1 2 3
In [92]: pd.__version__
Out[92]: '0.21.0'
In [93]: df.apply(lambda x: (1, 2, 3), axis=1, reduce=False)
Out[93]:
A B C
0 1 2 3
1 1 2 3
2 1 2 3
3 1 2 3
4 1 2 3
5 1 2 3
In [94]: df.apply(lambda x: (1, 2, 3), axis=1, reduce=True)
Out[94]:
0 (1, 2, 3)
1 (1, 2, 3)
2 (1, 2, 3)
3 (1, 2, 3)
4 (1, 2, 3)
5 (1, 2, 3)
dtype: object |
@jreback I will try to review this tomorrow |
i think i can simply deprecate reduce let me see |
|
doc/source/whatsnew/v0.22.0.txt
Outdated
.. ipython:: python | ||
|
||
df = pd.DataFrame([[1,2], [1,2]], columns=['a','b']) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to display df
here to show what it is prior to the apply
, similar to what was done in the example above.
doc/source/whatsnew/v0.22.0.txt
Outdated
df.apply(lambda x: [1, 2, 3], axis=1) | ||
df.apply(lambda x: [1, 2], axis=1) | ||
|
||
The returned input will also *not* return a Series with the list-wrapper as previously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use code formatting here: ``Series``
doc/source/whatsnew/v0.22.0.txt
Outdated
dtype: object | ||
|
||
|
||
New Behavior. The behaviour is consistent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: The second "behaviour" is using British spelling, but the first isn't.
doc/source/whatsnew/v0.22.0.txt
Outdated
dtype: object | ||
|
||
|
||
New Behaviour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: British spelling
pandas/core/frame.py
Outdated
@@ -4818,6 +4818,8 @@ def apply(self, func, axis=0, broadcast=False, raw=False, reduce=None, | |||
while guessing, exceptions raised by func will be ignored). If | |||
reduce is True a Series will always be returned, and if False a | |||
DataFrame will always be returned. | |||
|
|||
.. deprecated:: 0.22.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be mentioned in the whatsnew?
I reverted the deprecation off |
thanks for comments @jschendel |
I didn't look at any code yet, but some general comments from experimenting with the branch:
|
Contemplating my above comments, I think I would be in favor of (1) "no inference" for general iterable things (objects, tuples, ..), and (2) a clear way to enable / disable inference. The switch to enable/disable inference could have a default depending on the result type, eg by default infer (so not putting the result as a scalar in a single cell) for Series objects and maybe arrays, and by default no inference for tuples, dicts, generic objects, and probably lists as well. When doing this, I think the main breakage is the case of a "list with length equal to number of columns of original frame", but in that case we could even catch this case and raise a FutureWarning. Note I didn't look at all examples from the issues yet, so I might be missing some cases. |
so we need some sort of a parameter to allow inference on the returned result. This is analgous to specifying an output dtype & (shape) in np.vectorize maybe something like:
|
I think |
this already raises (an incorrect shape whether the result is a list of Series), added a test. (matches 0.22.0) |
Yes, if it has an incorrect shape, then it already raises (just like lists). But my example has a correct length, only different index names. Using your example above:
I think the above is not the desired result. IMO one of the goals of |
closes pandas-dev#16353 closes pandas-dev#17348 closes pandas-dev#17437 closes pandas-dev#18573 closes pandas-dev#17970 closes pandas-dev#17892 closes pandas-dev#17602 closes pandas-dev#18775 closes pandas-dev#18901 closes pandas-dev#18919
you think that [3] should actually be [2], IOW we ignore if the resulting index (if any) on a broadcastable? ok I think that's fine. |
ok fixed up. |
I think that is our record of closed issues in one go :-) Thanks @jreback ! |
…-dev#18577) closes pandas-dev#16353 closes pandas-dev#17348 closes pandas-dev#17437 closes pandas-dev#18573 closes pandas-dev#17970 closes pandas-dev#17892 closes pandas-dev#17602 closes pandas-dev#18775 closes pandas-dev#18901 closes pandas-dev#18919
closes #16353
closes #17348
closes #17437
closes #18573
closes #17970
closes #17892
closes #17602
closes #15628
closes #18775
closes #18901
closes #18919
This fixes apply to work correctly when the returned shape mismatches the original. It will try to set the indices if it possible. Setting to a list-like with
axis=1
is now disallowed (but still possible if you operate row-wise). We were applying this inconsitently. This is of course a discouraged practice anyhow.Prob should add some examples / update the doc-string a bit.