-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Assignment of string list to column doesn't work #11944
Comments
@Ullar-Kask could you include a MWE, specifically how you constructed your
|
As a note, I believe this will work: df.loc[df['column'] =='value', 'column2'] = cudf.Scalar(['0','1']) (this is still a bug -- we shouldn't require the user to construct a |
@vyasr I've tried various ways to circumvent the error, e.g.
or, but all fail with slightly different error msgs:
|
BTW, this works: This also works: But the last statement fails when re-executed:
|
Here's a complete example: import cudf
df = cudf.DataFrame(data={"a": ["yes", "no"], "b": [["l1", "l2"], ["c", "d"]]})
df.loc[df.a == "yes", "b"] = ["hello"] This goes through Now it gets messy, because there's some logic error in list column setitem that precludes many of the other approaches from working. As @shwina says, for the specific case of setting a "scalar" broadcastable value:
works If you want to set something more complicated (say): df = cudf.DataFrame(data={"a": ["yes", "no", "yes"], "b": [["l1", "l2"], ["c", "d"], ["e"]]})
df.loc[df.a == "yes", "b"] = [["a"], ["g"]] This fails, and all of the workarounds are very internals-heavy: WARNING, WARNING: Do not use this code!import cudf
from cudf.core.column.column import ColumnBase, as_column
df = cudf.DataFrame(data={"a": ["yes", "no", "yes"], "b": [["l1", "l2"], ["c", "d"], ["e"]]})
ColumnBase.__setitem__(df.b._column, (df.a == "yes")._column, as_column([["x"], ["y"]])
In [138]: df
Out[138]:
a b
0 yes [x]
1 no [c, d]
2 yes [y] |
This works the first time because it's adding a new column. The second time you execute things, the column already exists so you go down the bad code path. |
I've found a workaround for my problem. Instead of conditional assignment of string list I create a dataframe using the original index, populate the df and then merge with the original df joining by index:
|
Describe the bug
This does not work:
df.loc[df['column'] =='value', 'column2'] = ['0','1']
TypeError: Implicit conversion to a host NumPy array via array is not allowed, To explicitly construct a GPU matrix, consider using .to_cupy()
To explicitly construct a host matrix, consider using .to_numpy().
Integers do work:
df.loc[df['column']=='value', 'column2'] = [0,1]
Both work in pandas.
Rapids 22.08
Ubuntu 20
The text was updated successfully, but these errors were encountered: