Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DataFrame setitem causing data corruption for cupy array inputs #8672

Closed
esnvidia opened this issue Jul 7, 2021 · 1 comment · Fixed by #9006
Closed

[BUG] DataFrame setitem causing data corruption for cupy array inputs #8672

esnvidia opened this issue Jul 7, 2021 · 1 comment · Fixed by #9006
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@esnvidia
Copy link

esnvidia commented Jul 7, 2021

Describe the bug
Setting the values obtained from multiple columns of a StandardScaler to the GDF is not working (although this might be user error)

Steps/Code to reproduce bug
image (1)
image (2)

Code is here
Untitled.zip

Expected behavior
Set the values to the GDF. All the columns have the same values in the GDF despite the output not showing it should be the case.
Looks like values are

Environment overview (please complete the following information)
conda install rapids 21.06, bare-metal.

@esnvidia esnvidia added Needs Triage Need team to review and classify bug Something isn't working labels Jul 7, 2021
@beckernick
Copy link
Member

beckernick commented Aug 3, 2021

The following is a minimal, reproducible example that demonstrates the underlying setitem issue causing data corruption in 21.08. In this example, the first 10 values of the contiguous array are being set into the 10 rows for each column.

import cudf
import cupy as cpdf = cudf.DataFrame({
    "a": range(10),
    "b": range(10),
})
arr = cp.arange(20,40).reshape(-1, 2)
df.loc[:, ["a","b"]] = arrprint(arr)
print(df)
[[20 21]
 [22 23]
 [24 25]
 [26 27]
 [28 29]
 [30 31]
 [32 33]
 [34 35]
 [36 37]
 [38 39]]
    a   b
0  20  20
1  21  21
2  22  22
3  23  23
4  24  24
5  25  25
6  26  26
7  27  27
8  28  28
9  29  29

@beckernick beckernick added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Aug 3, 2021
@beckernick beckernick changed the title [BUG] Setting the values obtained from multiple columns to the GDF is not working [BUG] DataFrame setitem causing data corruption for cupy array inputs Aug 3, 2021
@galipremsagar galipremsagar self-assigned this Aug 3, 2021
rapids-bot bot pushed a commit that referenced this issue Aug 17, 2021
Fixes: #8672 

This PR handles `ndarray` inputs in the `_DataFrameLocIndexer.__setitem__`

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Marlene  (https://github.com/marlenezw)

URL: #9006
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants