Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AttributeError when groupby as_index=False on empty DataFrame #35324

Merged
merged 3 commits into from
Jul 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1117,6 +1117,7 @@ Groupby/resample/rolling
- Bug in :meth:`DataFrame.ewm.cov` was throwing ``AssertionError`` for :class:`MultiIndex` inputs (:issue:`34440`)
- Bug in :meth:`core.groupby.DataFrameGroupBy.quantile` raises ``TypeError`` for non-numeric types rather than dropping columns (:issue:`27892`)
- Bug in :meth:`core.groupby.DataFrameGroupBy.transform` when ``func='nunique'`` and columns are of type ``datetime64``, the result would also be of type ``datetime64`` instead of ``int64`` (:issue:`35109`)
- Bug in :meth:`DataFrame.groupby` raising an ``AttributeError`` when selecting a column and aggregating with ``as_index=False`` (:issue:`35246`).
- Bug in :meth:'DataFrameGroupBy.first' and :meth:'DataFrameGroupBy.last' that would raise an unnecessary ``ValueError`` when grouping on multiple ``Categoricals`` (:issue:`34951`)

Reshaping
Expand Down
17 changes: 11 additions & 6 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -963,18 +963,23 @@ def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs)
# try to treat as if we are passing a list
try:
result = self._aggregate_multiple_funcs([func], _axis=self.axis)
except ValueError as err:
if "no results" not in str(err):
# raised directly by _aggregate_multiple_funcs
raise
result = self._aggregate_frame(func)
else:

# select everything except for the last level, which is the one
# containing the name of the function(s), see GH 32040
result.columns = result.columns.rename(
[self._selected_obj.columns.name] * result.columns.nlevels
).droplevel(-1)

except ValueError as err:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you leave this unchanged, does it work to put another try/except in the else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I was trying to avoid that because nesting exception handling increase complexity.
  • But you are right. There is a chance that the self._aggregate_multiple_funcs also raise AttributeError and this will change the behavior in that case.
  • Will update the code as your suggestion. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that nesting exception handling is not ideal, maybe an isinstance check for a Series instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

between nesting exception handling and explicit type-checking (also nesting inside another exception handling), I think the first one is less evil :). what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from a static typing perspective, the isinstance is probably better. In this case, an AttributeError should be easily avoided. hasattr is not ideal from a static typing/mypy perspective either. However, leave it as a try/except for now and see what others think.

if "no results" not in str(err):
# raised directly by _aggregate_multiple_funcs
raise
result = self._aggregate_frame(func)
except AttributeError:
# catch exception from line 969
# (Series does not have attribute "columns"), see GH 35246
result = self._aggregate_frame(func)

if relabeling:

# used reordered index of columns
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -605,6 +605,14 @@ def test_as_index_select_column():
tm.assert_series_equal(result, expected)


def test_groupby_as_index_select_column_sum_empty_df():
# GH 35246
df = DataFrame(columns=["A", "B", "C"])
left = df.groupby(by="A", as_index=False)["B"].sum()
assert type(left) is DataFrame
assert left.to_dict() == {"A": {}, "B": {}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this test to be more like (and closer to the issue op)

df = pd.DataFrame(columns=["a", "b"])
grp = df.groupby(by="a", as_index=False)["b"]
result = grp.sum()

expected = pd.DataFrame(index=pd.Int64Index([], dtype="int64"), columns=["a", "b"])
expected["b"] = expected["b"].astype("float64")

tm.assert_frame_equal(result, expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great. I was struggling with IndexType



def test_groupby_as_index_agg(df):
grouped = df.groupby("A", as_index=False)

Expand Down