Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] AST produces non-null result when expression results in null scalar #8831

Closed
jlowe opened this issue Jul 22, 2021 · 2 comments · Fixed by #9117
Closed

[BUG] AST produces non-null result when expression results in null scalar #8831

jlowe opened this issue Jul 22, 2021 · 2 comments · Fixed by #9117
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@jlowe
Copy link
Contributor

jlowe commented Jul 22, 2021

Describe the bug
When building an expression to copy a null literal, the resulting column from evaluating the expression contains all non-null values.

Steps/Code to reproduce bug
The following test code will reproduce the problem:

  auto c_0   = column_wrapper<int32_t>{0, 0, 0, 0};
  auto table = cudf::table_view{{c_0}};

  auto literal_value = cudf::numeric_scalar<int32_t>(-123);
  literal_value.set_valid_async(false);
  auto literal       = cudf::ast::literal(literal_value);

  auto expression = cudf::ast::expression(cudf::ast::ast_operator::IDENTITY, literal);

  auto result   = cudf::ast::compute_column(table, expression);
  auto expected = column_wrapper<int32_t>({-123, -123, -123, -123}, {0, 0, 0, 0});

  cudf::test::expect_columns_equal(expected, result->view(), true);

Expected behavior
When the expression evaluation for a row of a column results in a null scalar, the corresponding row of the output column should be null.

@jlowe jlowe added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS labels Jul 22, 2021
@vyasr
Copy link
Contributor

vyasr commented Jul 22, 2021

Good catch, this is a bug in the current null handling that incorrectly assumes that nullability only requires special handling for column nodes, not literal nodes. I need to inject the corresponding logic for literals here. I'll tackle this as part of the ongoing refactor of conditional joins and see if it makes sense to make this part of the improved framework for handling nulls that we've discussed.

@beckernick beckernick removed the Needs Triage Need team to review and classify label Jul 23, 2021
@beckernick beckernick added this to the Conditional Joins milestone Jul 23, 2021
@vyasr vyasr self-assigned this Aug 10, 2021
@vyasr
Copy link
Contributor

vyasr commented Aug 25, 2021

This issue turned out to be a bit more involved than my previous comment. That comment was correct, but in addition the nullability of literals contained in an expression has to be considered when determined whether to even descend through the nullable code path. In particular, the change above is insufficient if a null literal is included in an expression for compute_column that is operating on a table that contains no nulls. It's not difficult to add that check, I'm currently just trying to figure out the best way to do so within the current design.

@rapids-bot rapids-bot bot closed this as completed in #9117 Sep 7, 2021
rapids-bot bot pushed a commit that referenced this issue Sep 7, 2021
This PR resolves #8831 by propagating null values appropriately when they are provided in the form of literals. In addition to adding support at the level of expression evaluation, this PR also adds the appropriate dispatch to the correct code path at the level of calls to APIs using expressions containing nulls so that the null-supporting code paths can be triggered even when operating on tables that do not contain any nulls.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Jason Lowe (https://github.com/jlowe)
  - Bradley Dice (https://github.com/bdice)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #9117
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants