Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] explode_outer produces null positions when nested types contain empty children #7787

Closed
jlowe opened this issue Mar 31, 2021 · 0 comments · Fixed by #7843
Closed

[BUG] explode_outer produces null positions when nested types contain empty children #7787

jlowe opened this issue Mar 31, 2021 · 0 comments · Fixed by #7843
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@jlowe
Copy link
Member

jlowe commented Mar 31, 2021

Describe the bug
When generating positions, explode_outer can produce null positions for top-level types that are not null and not empty but contain empty children.

Steps/Code to reproduce bug

  1. Create a LIST column that contains a STRUCT column with a single field (can be int, string, whatever).
  2. Put five rows in the column:
  • The first row has a single struct instance that contains a non-null value
  • The second row has a single struct instance that contains a null value
  • The third row has a single null struct
  • The fourth row is an empty list (zero struct instances)
  • The fifth row is a null list
  1. Performing an explode_outer with positions will generate position values of 0, 0, null, null, null

Expected behavior
The positions generated should be 0, 0, 0, null, null

@jlowe jlowe added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS labels Mar 31, 2021
rapids-bot bot pushed a commit that referenced this issue Apr 20, 2021
…#7843)

The null values in the position column didn't match up to expectations exactly. It can't be directly copied from the exploded column as the exploded column may contain null values that shouldn't be null in the position column.

Fixes #7787

Authors:
  - Mike Wilson (https://github.com/hyperbolic2346)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Jason Lowe (https://github.com/jlowe)
  - Nghia Truong (https://github.com/ttnghia)
  - Jake Hemstad (https://github.com/jrhemstad)

URL: #7843
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant