Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep subclassing in apply #19823

Merged
merged 14 commits into from
Feb 24, 2018
Merged

Keep subclassing in apply #19823

merged 14 commits into from
Feb 24, 2018

Conversation

jaumebonet
Copy link
Contributor

@jaumebonet jaumebonet commented Feb 21, 2018

When generating new objects on apply, it calls self.obj._constructor and self.obj._constructior_sliced instead of: DataFrame and Series; keeping subclassing.

When generating new objects from apply, it calls the
self.obj._constructor, self.obj._constructior_sliced instead of
DataFrame, Series; keeping subclassing.
@pep8speaks
Copy link

pep8speaks commented Feb 21, 2018

Hello @jaumebonet! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 23, 2018 at 08:49 Hours UTC

@chris-b1
Copy link
Contributor

Thanks - needs some tests, and it looks like it broke something for sparse, otherwise your approach seems reasonable at a glance.

@jaumebonet
Copy link
Contributor Author

Hi @chris-b1,
Thanks!
Yes, I'm trying to pinpoint what exactly has changed... Clearly it has to do with the fact that now it returns SparseSeries instead of Series, but I'm not familiar with SparseSeries, so I'm not sure what difference does that make.

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Compat pandas objects compatability with Numpy or Python functions labels Feb 22, 2018
@jreback
Copy link
Contributor

jreback commented Feb 22, 2018

can you add some tests in pandas/tests/frame/test_sub_class (and for Series as well)

@jreback
Copy link
Contributor

jreback commented Feb 22, 2018

also pls add a release note, other enhancements section is ok

@codecov
Copy link

codecov bot commented Feb 22, 2018

Codecov Report

Merging #19823 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19823      +/-   ##
==========================================
+ Coverage   91.58%   91.61%   +0.02%     
==========================================
  Files         150      150              
  Lines       48908    48906       -2     
==========================================
+ Hits        44792    44804      +12     
+ Misses       4116     4102      -14
Flag Coverage Δ
#multiple 89.98% <100%> (+0.02%) ⬆️
#single 41.78% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/apply.py 96.74% <100%> (-0.04%) ⬇️
pandas/util/testing.py 83.85% <0%> (+0.2%) ⬆️
pandas/plotting/_converter.py 66.95% <0%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8768876...87f2c5c. Read the comment docs.

@jaumebonet
Copy link
Contributor Author

Hi @jreback,
I've added a test to check if subclass of Series are kept when using apply on DataFrame.
Series have they own apply function, so the changes do not affect how Series.apply behave.

As per your suggestion, I also added a bullet point in the other enhancement section.

@@ -295,6 +295,7 @@ Other Enhancements
- ``IntervalIndex.astype`` now supports conversions between subtypes when passed an ``IntervalDtype`` (:issue:`19197`)
- :class:`IntervalIndex` and its associated constructor methods (``from_arrays``, ``from_breaks``, ``from_tuples``) have gained a ``dtype`` parameter (:issue:`19262`)
- Added :func:`SeriesGroupBy.is_monotonic_increasing` and :func:`SeriesGroupBy.is_monotonic_decreasing` (:issue:`17015`)
- :func:``DataFrame.apply`` keeps provides the specified ``Series`` subclass when `DataFrame._constructor_sliced`` is defined (:issue:`19822`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use a private property here, just say when Series and ``DataFrame``` are subclassed

['Mary', 'Bo', 'weight', 150]],
columns=['first', 'last', 'variable', 'value'])

df.apply(lambda x: check_row_subclass(x))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to check the result of this using tm.assert_frame_equal, just use a simple example of an applied function, testing both returning a list, e.g. lambda x: [1, 2, 3] and a sub-classed series

[1, 2, 3],
[1, 2, 3]])

expected3 = DesignSeries([[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this needs to be updated.

expected3 = tm.SubclassedSeries([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than numbering these, can you just do one after the other and use

result = 
expected = 

@@ -295,8 +295,10 @@ Other Enhancements
- ``IntervalIndex.astype`` now supports conversions between subtypes when passed an ``IntervalDtype`` (:issue:`19197`)
- :class:`IntervalIndex` and its associated constructor methods (``from_arrays``, ``from_breaks``, ``from_tuples``) have gained a ``dtype`` parameter (:issue:`19262`)
- Added :func:`SeriesGroupBy.is_monotonic_increasing` and :func:`SeriesGroupBy.is_monotonic_decreasing` (:issue:`17015`)
- :func:``DataFrame.apply`` keeps the specified ``Series`` subclass when ``Series`` and ``DataFrame`` subclasses are defined (:issue:`19822`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this not so clear. It is about the Series type that is passed to the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
When applying a function to a subclassed DataFrame I realised I could not call the specific methods of the subclassed Series that this DataFrame was supossed to generate through _constructor_sliced. As far as I understood, the subclassing of the Series was lost during apply.
With these changes the subclass is now kept inside apply, so method of the expected Series subclass can be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, can you try to clarify the text a bit? Eg something like "For subclassed DataFrames, DataFrame.apply will now preserve the Series subclass (if defined) when passing the data to the applied function" (but adapt as you like)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! I like it as this! :)
will change it!

@jreback jreback added this to the 0.23.0 milestone Feb 24, 2018
@jreback jreback merged commit 26a2d41 into pandas-dev:master Feb 24, 2018
@jreback
Copy link
Contributor

jreback commented Feb 24, 2018

thanks!

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subclassing is lost when using DataFrame.apply
6 participants