Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] df.index.name = "indexer" does not work as expected under cudf.pandas #14524

Closed
isVoid opened this issue Nov 29, 2023 · 4 comments · Fixed by #16234
Closed

[BUG] df.index.name = "indexer" does not work as expected under cudf.pandas #14524

isVoid opened this issue Nov 29, 2023 · 4 comments · Fixed by #16234
Assignees
Labels
0 - Backlog In queue waiting for assignment 1 - On Deck To be worked on next bug Something isn't working cudf.pandas Issues specific to cudf.pandas Python Affects Python cuDF API.

Comments

@isVoid
Copy link
Contributor

isVoid commented Nov 29, 2023

Describe the bug
When setting the name attribute for df.index, the name of the index is supposed to update. When following the reproducer below, it doesn't work as expected.

Steps/Code to reproduce bug

import pandas as pd

df1 = pd.DataFrame(columns=["x"], dtype="int64")
df2 = df1.dtypes
df3 = df2.reset_index()

df3.index.name = "indexer"
assert df3.index.name == "indexer"

Expected behavior
No exception should be thrown from above.

Environment overview (please complete the following information)
Bare-metal, conda

@isVoid isVoid added bug Something isn't working Needs Triage Need team to review and classify labels Nov 29, 2023
@wence-
Copy link
Contributor

wence- commented Nov 29, 2023

This is, approximately, because df2 has a column that is not representable as a fast object. Simpler example:

s = pd.Series([object()])
s.index.name = "what"
print(s.index.name) # => None

I think what happens is that s.index provokes a fast->slow conversion (so that s._fsproxy_state is slow). But then we have a "final" object at this point, so we then do index.name = "what" and that pushes the index to cudf, sets the name there. But now this destroyed the link between s.index and the name we set.

@shwina
Copy link
Contributor

shwina commented Nov 29, 2023

Yeah we might need a custom name setter and getter for our proxy index type, given how common this operation is and we need to support it correctly.

@vyasr
Copy link
Contributor

vyasr commented Nov 29, 2023

I think what happens is that s.index provokes a fast->slow conversion (so that s._fsproxy_state is slow).

I'm confused, shouldn't the original object in this case already be a slow object? In which case s.index should also be a slow object. And then what triggers the final slow->fast conversion when doing s.index.name = "what"?

@wence-
Copy link
Contributor

wence- commented Nov 30, 2023

There are two things at work:

  1. The series is not representable as a fast object
  2. The index is representable as a fast object

A method call on the series will first attempt to copy slow-to-fast, fail, and then call the slow method and wrap up the result. The return value is in the slow state. But, the index can be fast, so when we call a method on the index that attempts a slow-to-fast copy, succeeds, and calls the fast method. The index object is now in the fast state.

Now, the problem is that we modified the internal state of the return value of s.index. But, s.index is a property, so the next time we call it, we'll attempt a slow-to-fast copy of s, fail, call the slow method, and wrap it up. With the consequence that id(s.index) != id(s.index). Since at any one time a proxy object either holds a fast object or a slow object, any modifications we perform on the index we get back from s.index are not "in-place" in this scenario, and so don't propagate to the index that underlies s.

In [72]: s = pd.Series([object()])

In [73]: i = s.index

In [74]: id(i._fsproxy_slow)
Out[74]: 140540061935904

In [75]: i._fsproxy_state
Out[75]: <_State.SLOW: 0>

In [76]: i.name = "bar"

In [77]: i._fsproxy_state
Out[77]: <_State.FAST: 1>

In [78]: id(i._fsproxy_fast)
Out[78]: 140540060927552

In [79]: id(s.index._fsproxy_slow)
Out[79]: 140540061935904

In [80]: id(i)
Out[80]: 140543720616288

In [81]: id(s.index)
Out[81]: 140540061939936

In [82]: id(i._fsproxy_slow)
Out[82]: 140540061938064

@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Dec 14, 2023
@mroeschke mroeschke added the cudf.pandas Issues specific to cudf.pandas label May 24, 2024
@Matt711 Matt711 self-assigned this Jul 9, 2024
@Matt711 Matt711 added the 1 - On Deck To be worked on next label Jul 9, 2024
@GPUtester GPUtester moved this from Todo to In Progress in cuDF Python Jul 9, 2024
@rapids-bot rapids-bot bot closed this as completed in 2b2058d Jul 11, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment 1 - On Deck To be worked on next bug Something isn't working cudf.pandas Issues specific to cudf.pandas Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants