Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep subclassing in apply #19823

Merged
merged 14 commits into from
Feb 24, 2018
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -295,8 +295,10 @@ Other Enhancements
- ``IntervalIndex.astype`` now supports conversions between subtypes when passed an ``IntervalDtype`` (:issue:`19197`)
- :class:`IntervalIndex` and its associated constructor methods (``from_arrays``, ``from_breaks``, ``from_tuples``) have gained a ``dtype`` parameter (:issue:`19262`)
- Added :func:`SeriesGroupBy.is_monotonic_increasing` and :func:`SeriesGroupBy.is_monotonic_decreasing` (:issue:`17015`)
- :func:``DataFrame.apply`` keeps the specified ``Series`` subclass when ``Series`` and ``DataFrame`` subclasses are defined (:issue:`19822`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this not so clear. It is about the Series type that is passed to the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
When applying a function to a subclassed DataFrame I realised I could not call the specific methods of the subclassed Series that this DataFrame was supossed to generate through _constructor_sliced. As far as I understood, the subclassing of the Series was lost during apply.
With these changes the subclass is now kept inside apply, so method of the expected Series subclass can be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, can you try to clarify the text a bit? Eg something like "For subclassed DataFrames, DataFrame.apply will now preserve the Series subclass (if defined) when passing the data to the applied function" (but adapt as you like)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! I like it as this! :)
will change it!

- :func:`DataFrame.from_dict` now accepts a ``columns`` argument that can be used to specify the column names when ``orient='index'`` is used (:issue:`18529`)


.. _whatsnew_0230.api_breaking:

Backwards incompatible API changes
Expand Down
16 changes: 8 additions & 8 deletions pandas/core/apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ def apply_empty_result(self):
pass

if reduce:
return Series(np.nan, index=self.agg_axis)
return self.obj._constructor_sliced(np.nan, index=self.agg_axis)
else:
return self.obj.copy()

Expand All @@ -175,11 +175,13 @@ def apply_raw(self):
result = np.apply_along_axis(self.f, self.axis, self.values)

# TODO: mixed type case
from pandas import DataFrame, Series
if result.ndim == 2:
return DataFrame(result, index=self.index, columns=self.columns)
return self.obj._constructor(result,
index=self.index,
columns=self.columns)
else:
return Series(result, index=self.agg_axis)
return self.obj._constructor_sliced(result,
index=self.agg_axis)

def apply_broadcast(self, target):
result_values = np.empty_like(target.values)
Expand Down Expand Up @@ -232,7 +234,7 @@ def apply_standard(self):
axis=self.axis,
dummy=dummy,
labels=labels)
return Series(result, index=labels)
return self.obj._constructor_sliced(result, index=labels)
except Exception:
pass

Expand Down Expand Up @@ -291,8 +293,7 @@ def wrap_results(self):
return self.wrap_results_for_axis()

# dict of scalars
from pandas import Series
result = Series(results)
result = self.obj._constructor_sliced(results)
result.index = self.res_index

return result
Expand Down Expand Up @@ -379,7 +380,6 @@ def wrap_results_for_axis(self):
# we have a non-series and don't want inference
elif not isinstance(results[0], ABCSeries):
from pandas import Series

result = Series(results)
result.index = self.res_index

Expand Down
56 changes: 56 additions & 0 deletions pandas/tests/frame/test_subclass.py
Original file line number Diff line number Diff line change
Expand Up @@ -514,3 +514,59 @@ def test_subclassed_wide_to_long(self):
long_frame = pd.wide_to_long(df, ["A", "B"], i="id", j="year")

tm.assert_frame_equal(long_frame, expected)

def test_subclassed_apply(self):
# GH 19822

def check_row_subclass(row):
assert isinstance(row, tm.SubclassedSeries)

def strech(row):
if row["variable"] == "height":
row["value"] += 0.5
return row

df = tm.SubclassedDataFrame([
['John', 'Doe', 'height', 5.5],
['Mary', 'Bo', 'height', 6.0],
['John', 'Doe', 'weight', 130],
['Mary', 'Bo', 'weight', 150]],
columns=['first', 'last', 'variable', 'value'])

df.apply(lambda x: check_row_subclass(x))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to check the result of this using tm.assert_frame_equal, just use a simple example of an applied function, testing both returning a list, e.g. lambda x: [1, 2, 3] and a sub-classed series

df.apply(lambda x: check_row_subclass(x), axis=1)

expected = tm.SubclassedDataFrame([
['John', 'Doe', 'height', 6.0],
['Mary', 'Bo', 'height', 6.5],
['John', 'Doe', 'weight', 130],
['Mary', 'Bo', 'weight', 150]],
columns=['first', 'last', 'variable', 'value'])

result = df.apply(lambda x: strech(x), axis=1)
assert isinstance(result, tm.SubclassedDataFrame)
tm.assert_frame_equal(result, expected)

expected = tm.SubclassedDataFrame([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])

result = df.apply(lambda x: tm.SubclassedSeries([1, 2, 3]), axis=1)
assert isinstance(result, tm.SubclassedDataFrame)
tm.assert_frame_equal(result, expected)

result = df.apply(lambda x: [1, 2, 3], axis=1, result_type="expand")
assert isinstance(result, tm.SubclassedDataFrame)
tm.assert_frame_equal(result, expected)

expected = tm.SubclassedSeries([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than numbering these, can you just do one after the other and use

result = 
expected = 

[1, 2, 3]])

result = df.apply(lambda x: [1, 2, 3], axis=1)
assert not isinstance(result, tm.SubclassedDataFrame)
tm.assert_series_equal(result, expected)