Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DataFrame columns rename. Different behaviour with regards Pandas #10117

Closed
miguelusque opened this issue Jan 25, 2022 · 1 comment · Fixed by #10120
Closed

[BUG] DataFrame columns rename. Different behaviour with regards Pandas #10117

miguelusque opened this issue Jan 25, 2022 · 1 comment · Fixed by #10120
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@miguelusque
Copy link
Member

Describe the bug
In Pandas, when renaming a column to a column name that already exists in the DataFrame, two columns with the same column name are created.

When performing the same operation in cuDF, the column to be renamed is deleted from the DataFrame.

Steps/Code to reproduce bug

import cudf
import pandas as pd

pdf = pd.DataFrame({"a": [1, 2, 3], "b": [3, 4, 5]})
gdf = cudf.DataFrame({"a": [1, 2, 3], "b": [3, 4, 5]})
pdf.rename(columns={"a": "b"}, inplace=True)
gdf.rename(columns={"a": "b"}, inplace=True)
print ("Pandas\n", pdf, "\n")
print ("cuDF\n", gdf)


Pandas
b b
0 1 3
1 2 4
2 3 5

cuDF
b
0 3
1 4
2 5

Hope it helps!
Miguel
Expected behavior
Same behaviour than Pandas, or at least an error/warning highlighting that the operation could not be performed.

Environment overview (please complete the following information)
cudf.version is '21.12.02'
DGX-1 Server

@miguelusque miguelusque added Needs Triage Need team to review and classify bug Something isn't working labels Jan 25, 2022
@beckernick
Copy link
Member

Thanks for raising this issue. Silently removing the column is certainly not expected behavior here, as we throw in other situations such as directly setting column names with a duplicate label (ValueError: Duplicate column names are not allowed).

@beckernick beckernick added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Jan 25, 2022
@galipremsagar galipremsagar self-assigned this Jan 25, 2022
rapids-bot bot pushed a commit that referenced this issue Jan 27, 2022
Fixes: #10117 

This PR adds a duplicate column validation check in `ColumnAccessor.rename_levels`

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Charles Blackmon-Luca (https://github.com/charlesbluca)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #10120
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants