Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Function cudf.DataFrame.pivot doesn't support lists of columns and index arguments #17360

Closed
raisadz opened this issue Nov 19, 2024 · 1 comment · Fixed by #17373
Closed
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@raisadz
Copy link

raisadz commented Nov 19, 2024

Describe the bug
Function cudf.DataFrame.pivot doesn't support lists of columns and index arguments. In addition, the error message is misleading.

Steps/Code to reproduce bug

import cudf

data = {'bar': ['x', 'y', 'z', 'w'], 'col': ['a', 'b', 'a', 'b'], 'foo': [1, 2, 3, 4], 'ix': [1, 1, 2, 2]}
df = cudf.DataFrame(data)
df.pivot(columns=['col'], index=['ix'])

This produces the following error:

ValueError                                Traceback (most recent call last)
Cell In[19], line 1
----> 1 df.pivot(columns=['col'], index=['ix'])

File /opt/conda/lib/python3.10/site-packages/cudf/utils/performance_tracking.py:51, in _performance_tracking.<locals>.wrapper(*args, **kwargs)
     43 if nvtx.enabled():
     44     stack.enter_context(
     45         nvtx.annotate(
     46             message=func.__qualname__,
   (...)
     49         )
     50     )
---> 51 return func(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/cudf/core/dataframe.py:7475, in DataFrame.pivot(self, columns, index, values)
   7472 @_performance_tracking
   7473 @copy_docstring(reshape.pivot)
   7474 def pivot(self, *, columns, index=no_default, values=no_default):
-> 7475     return cudf.core.reshape.pivot(
   7476         self, index=index, columns=columns, values=values
   7477     )

File /opt/conda/lib/python3.10/site-packages/cudf/core/reshape.py:1036, in pivot(data, columns, index, values)
   1034     index = df.index
   1035 else:
-> 1036     index = cudf.core.index.Index(df.loc[:, index])
   1037 columns = cudf.Index(df.loc[:, columns])
   1039 # Create a DataFrame composed of columns from both
   1040 # columns and index

File /opt/conda/lib/python3.10/site-packages/cudf/core/index.py:87, in IndexMeta.__call__(cls, data, *args, **kwargs)
     82     raise NotImplementedError(
     83         "tupleize_cols is currently not supported."
     84     )
     86 if cls is Index:
---> 87     return as_index(
     88         arbitrary=data,
     89         *args,
     90         **kwargs,
     91     )
     92 return super().__call__(data, *args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/cudf/utils/performance_tracking.py:51, in _performance_tracking.<locals>.wrapper(*args, **kwargs)
     43 if nvtx.enabled():
     44     stack.enter_context(
     45         nvtx.annotate(
     46             message=func.__qualname__,
   (...)
     49         )
     50     )
---> 51 return func(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/cudf/core/index.py:3114, in as_index(arbitrary, nan_as_null, copy, name, dtype)
   3110     return cudf.MultiIndex.from_pandas(
   3111         arbitrary.copy(deep=copy), nan_as_null=nan_as_null
   3112     )
   3113 elif isinstance(arbitrary, cudf.DataFrame) or is_scalar(arbitrary):
-> 3114     raise ValueError("Index data must be 1-dimensional and list-like")
   3115 else:
   3116     return as_index(
   3117         column.as_column(arbitrary, dtype=dtype, nan_as_null=nan_as_null),
   3118         copy=copy,
   3119         name=name,
   3120         dtype=dtype,
   3121     )

ValueError: Index data must be 1-dimensional and list-like

Expected behavior
Expected the same output as produced by pandas:

print(df.to_pandas().pivot(columns=['col'], index=['ix']))
bar    foo   
col   a  b   a  b
ix               
1     x  y   1  2
2     z  w   3  4

Environment overview (please complete the following information)

  • Environment location: Kaggle notebook

Environment details

print(cudf.__version__)
24.08.03

Additional context
Add any other context about the problem here.

@mroeschke
Copy link
Contributor

Thanks for the report!

I opened #17373 to fix this, and hopefully it should be included in the 24.12 release

@rapids-bot rapids-bot bot closed this as completed in 332cc06 Nov 20, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants