Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cannot assign DataFrame.column.name #14012

Closed
mroeschke opened this issue Aug 30, 2023 · 5 comments
Closed

[BUG] Cannot assign DataFrame.column.name #14012

mroeschke opened this issue Aug 30, 2023 · 5 comments
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@mroeschke
Copy link
Contributor

Describe the bug
DataFrame.pop removes column name

Steps/Code to reproduce bug

In [1]: import cudf

In [2]: df = cudf.DataFrame({"a": [1., 2.]})

In [3]: df.index.name = "baz"

In [4]: df
Out[4]: 
       a
baz     
0    1.0
1    2.0

In [5]: df["foo"] = "bar"

In [6]: df
Out[6]: 
       a  foo
baz          
0    1.0  bar
1    2.0  bar

In [7]: df.pop("foo")
Out[7]: 
baz
0    bar
1    bar
Name: foo, dtype: object

In [8]: df.columns.name # should be "baz"

In [9]: df.columns
Out[9]: Index(['a'], dtype='object')

Expected behavior

In [8]: df.columns.name
"baz"

Environment overview (please complete the following information)

  • Environment location: Bare-metal
  • Method of cuDF install: conda
    • If method of install is [Docker], provide docker pull & docker run commands used

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context
Add any other context about the problem here.

@mroeschke mroeschke added bug Something isn't working Python Affects Python cuDF API. labels Aug 30, 2023
@galipremsagar
Copy link
Contributor

galipremsagar commented Aug 31, 2023

@mroeschke Is the example you provided for a different issue? Or were you referring to this:

In [37]: df = cudf.DataFrame({"a": [1., 2.]})

In [38]: df
Out[38]: 
     a
0  1.0
1  2.0

In [39]: df.columns.name = "baz"

In [40]: df
Out[40]: 
     a
0  1.0
1  2.0

In [41]: df.columns
Out[41]: Index(['a'], dtype='object')

.
.
.

@mroeschke
Copy link
Contributor Author

Ah yes my example in the OP is incorrect. Yeah I think the core issue is that column.name cannot be assigned

@mroeschke mroeschke changed the title [BUG] DataFrame.pop drops the column name [BUG] Cannot assign DataFrame.column.name Aug 31, 2023
@mroeschke
Copy link
Contributor Author

@galipremsagar is there a reason why cudf.DataFrame's return pandas objects for df.columns? I think this might be the limitation why the name cannot be set

In [1]: import cudf

In [2]: df = cudf.DataFrame({"a": [1., 2.]})

In [8]: type(df.columns)
Out[8]: pandas.core.indexes.base.Index

In [9]: type(df.index)
Out[9]: cudf.core.index.RangeIndex

@vyasr
Copy link
Contributor

vyasr commented Sep 5, 2023

I believe it is because most operations involving column names occur on the host, not on device. Storing the column names in a cudf Index would require constant H2D copies. @shwina may be able to provide more of the history here, but I do agree that this choice can sometimes cause confusion and make certain aspects of implementation harder.

@vyasr vyasr added this to cuDF Python Nov 5, 2024
@galipremsagar galipremsagar self-assigned this Dec 17, 2024
@galipremsagar galipremsagar moved this from Todo to In Progress in cuDF Python Dec 17, 2024
rapids-bot bot pushed a commit that referenced this issue Dec 19, 2024
Fixes: #17482, #14012

This PR fixes a long-standing issue where modifying `columns` `name` never propagates to the parent object. This PR fixes this issue by making `to_pandas_index` a cached-property and accessing it's names if this property was ever invoked in `level_names` property.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #17597
@galipremsagar
Copy link
Contributor

resolved in #17597

@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
Status: Done
Development

No branches or pull requests

3 participants