Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] explode_outer_position doesn't match to Spark's counterpart #7721

Closed
sperlingxx opened this issue Mar 25, 2021 · 2 comments · Fixed by #7754
Closed

[BUG] explode_outer_position doesn't match to Spark's counterpart #7721

sperlingxx opened this issue Mar 25, 2021 · 2 comments · Fixed by #7754
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.

Comments

@sperlingxx
Copy link
Contributor

Describe the bug
In cuDF, explode_outer_position will mark the position values of empty rows with 0. Meanwhile, the position values of empty rows are marked as null in Spark.

Steps/Code to reproduce bug
For input data like:

  • [[5,null,15], 100]
  • [null, 200]
  • [[], 300]

cuDF returns

  • [0, 5, 100]
  • [1, null, 100]
  • [2, 15, 100]
  • [0, null, 200]
  • [0, null, 300]

But Spark returns

  • [0, 5, 100]
  • [1, null, 100]
  • [2, 15, 100]
  • [null, null, 200]
  • [null, null, 300]
@sperlingxx sperlingxx added bug Something isn't working Needs Triage Need team to review and classify labels Mar 25, 2021
@hyperbolic2346
Copy link
Contributor

I assume this holds for explode_position as well?

@sperlingxx
Copy link
Contributor Author

I assume this holds for explode_position as well?

I think current explode_position implementation is identicial to Spark, since the empty/null elements of array will be regarded as other values.

@kkraus14 kkraus14 added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Mar 26, 2021
rapids-bot bot pushed a commit that referenced this issue Mar 30, 2021
… of null rows (#7754)

`explode_outer` supports writing a position column, but if the row was null it would incorrectly set the position to 0 and the row valid. Instead, it should null that position row as well. Luckily the null column matches 100% with the null column of the exploded column, so we can just copy it after it is created.

Fixes #7721

Authors:
  - Mike Wilson (@hyperbolic2346)

Approvers:
  - Conor Hoekstra (@codereport)
  - Jake Hemstad (@jrhemstad)

URL: #7754
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
3 participants