-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] reset_index on multi-column groupby results in an error #1801
Comments
This is not resolved by #1740. The current issue is that the MultiIndex does not have a import cudf
df = cudf.DataFrame({'x':[0, 0, 1, 1], 'y':[1, 1, 0, 0], 'z': [1, 2, 4, 3]})
print(df.groupby(['x', 'y']).sum().reset_index())
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-23-f0aefd2147dc> in <module>
1 import cudf
2 df = cudf.DataFrame({'x':[0, 0, 1, 1], 'y':[1, 1, 0, 0], 'z': [1, 2, 4, 3]})
----> 3 print(df.groupby(['x', 'y']).sum().reset_index())
/conda/envs/rapids/lib/python3.7/site-packages/cudf-0.8.0a1+120.gff317270.dirty-py3.7-linux-x86_64.egg/cudf/dataframe/dataframe.py in reset_index(self, drop)
777 def reset_index(self, drop=False):
778 if not drop:
--> 779 name = self.index.name or 'index'
780 out = DataFrame()
781 out[name] = self.index
AttributeError: 'MultiIndex' object has no attribute 'name' Some potential solutions could include tweaking the logic of reset_index to not rely on pandas allows MultiIndexes to have names, so that's probably the best solution. iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
x = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
x.name = '3'
print(x.name)
3 |
Hey @beckernick makes sense. After solving the name issue both this issue and #1740 will eventually run into the same issue here: Where we cannot assign a column to be a multi-index. When calling reset_index on a df with multi index pandas converts it to a multicolumn output. |
This is fixed by #1542 |
Fixed by #1542 |
Describe the bug
Invoking reset_index after performing a multi-cloumn groupby operation as shown in the script results in an error with 0.7. But works fine with cudf 0.6. Also, single column groupby works as expected with both 0.6 and 0.7
Steps/Code to reproduce bug
Output:
Environment details:
cudf 0.7
The text was updated successfully, but these errors were encountered: