[BUG] Inconsistency in cudfAST Evaluation for Complex Expressions #14409

aocsa · 2023-11-14T18:27:30Z

Describe the bug

cudfAST, as documented in Issue #12319, has known limitations in evaluating AST expressions involving mixed data types such as i32 + i64. This limitation stems from the lack of automatic type casting support in cudfAST. The source code reference here further explains this behavior. It appears that cudfAST deems an expression valid when operands share the same data type.

Observed Behavior

cudfAST successfully evaluates expressions where binary operands have consistent data types, and the AST tree's height is less than three. An example is (i64 < i64) == (f64 < f64). However, in attempts to evaluate more complex expressions with additional levels in the AST Tree, such as (i64 < i64) == (i64 < (i64 + i64)), cudfAST returns an error stating, "An AST expression was provided non-matching operand types."

Example of valid cudfAST expression

                (==)
               /    \
            (<)      (<)
           /  \     /   \
        i64  i64  str   str

Clarification Needed

It remains unclear whether this behavior is an intentional design choice or a limitation that warrants attention. The inability to process expressions with an AST tree height greater than two significantly restricts cudfAST's flexibility and usability.

Here is an example of a 'complex' expression.

                (==)
               /    \
            (<)      (<)
           /  \     /   \
        i64  i64  i64   (+)
                        /  \
                      i64  i64

Expected Behavior

Ideally, cudfAST should either support evaluation of complex expression or provide a clear error message explaining the limitation.

Steps to Reproduce

Execute a simple cudfAST expression like (i64 < i64) == (i64 < i64), which works as expected.
Attempt a more complex expression such as (i64 < i64) == (i64 < (i64 + i64)), which results in the mentioned error.

Steps/Code to Reproduce Bug

import cudf
import pandas as pd 

df = cudf.DataFrame({'b8': [True,True,False], 'i64': [4, 5, 6], 'f64': [4.0, 5.0, 6.0], 'str': ["A", "B", "C"]})
df.eval("(b8 < b8) == (str < str)") # OK
df.eval("(i64 < i64) == (str == str)") # OK 

df.eval("(i64 < i64) == (i64 < (i64 + i64))")
# Error received:
# CUDF failure at:/opt/conda/conda-bld/work/cpp/src/ast/expression_parser.cpp:149:
# An AST expression was provided non-matching operand types.

Environment overview

Environment location: conda
Method of cuDF install: conda
branch-23.10, origin/branch-23.10

The text was updated successfully, but these errors were encountered:

vyasr · 2023-11-14T19:39:51Z

Thank you for the report! This definitely looks like a bug to me. I'm not immediately sure why this would be happening, but it looks like the output type of the more deeply nested part of the expression tree is being determined incorrectly. This should be supported.

It definitely looks like the issue is with imbalanced trees because this expression works:

 df.eval("(i64 > (i64 + i64)) == (i64 < (i64 + i64))")

GregoryKimball · 2023-11-15T19:18:44Z

@revans2 Have you seen any issues with inbalanced AST trees in Spark-RAPIDS?

revans2 · 2023-11-15T19:38:38Z

@revans2 Have you seen any issues with inbalanced AST trees in Spark-RAPIDS?

@GregoryKimball we have not stress tested that part much, but obviously should do more of it. We have not seen this in practice, but most of the join conditions we see end up being split up and optimized by Spark anyways. So it might take a fair amount of work to make an imbalanced condition like this.

jlowe · 2023-11-15T21:36:19Z

I was able to reproduce the issue on Spark with the RAPIDS Accelerator by enabling AST projections and using a complex expression similar to the one proposed (the original expression Spark will simplify into one that works):

scala> spark.conf.set("spark.rapids.sql.projectAstEnabled", "true")
scala> spark.range(10).withColumn("x", col("id")).repartition(1).selectExpr("(id < x) == (id < (id + x))").collect
23/11/15 21:30:31 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 3)
ai.rapids.cudf.CudfException: CUDF failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-pre_release-203-cuda11/thirdparty/cudf/cpp/src/ast/expression_parser.cpp:149: An AST expression was provided non-matching operand types.
	at ai.rapids.cudf.ast.CompiledExpression.computeColumn(Native Method)
	at ai.rapids.cudf.ast.CompiledExpression.computeColumn(CompiledExpression.java:88)
	at com.nvidia.spark.rapids.GpuProjectAstExec$$anon$1.$anonfun$next$15(basicPhysicalOperators.scala:452)
[...]

vyasr · 2023-11-17T20:29:18Z

I believe I have a fix for this in #14445.

When parsing expressions, device data references are reused if there are multiple that are identical. Equality is determined by comparing the fields of the reference, but previously the data type was omitted. For column and literal references, this is OK because the `data_index` uniquely identifies the reference. For intermediates, however, the index is not sufficient to disambiguate because an expression could reuse a given location even if the operation produces a different data type. Therefore, the data type must be part of the equality operator. Resolves #14409 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) URL: #14445

) This PR addresses the issue at #14409. I would like to propose the addition of unit tests that involve scenarios like having 100 or 1000 elements in a tree, reaching 100 levels of depth, with diferent data types and similar stress tests. The purpose of these tests is to conduct comprehensive testing and stress the Abstract Syntax Tree (AST), ultimately aiding in the identification and resolution of any potential issues. By introducing these pathological tests, we aim to ensure the robustness and reliability of our codebase. These tests can help us uncover edge cases and performance bottlenecks that might otherwise go unnoticed. Authors: - Alexander Ocsa (https://github.com/aocsa) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #14459

aocsa added Needs Triage Need team to review and classify bug Something isn't working labels Nov 14, 2023

vyasr self-assigned this Nov 15, 2023

jlowe mentioned this issue Nov 15, 2023

[BUG] Complex AST expressions can crash with non-matching operand type error NVIDIA/spark-rapids#9733

Closed

jlowe added the Spark Functionality that helps Spark RAPIDS label Nov 15, 2023

vyasr mentioned this issue Nov 17, 2023

Fix intermediate type checking in expression parsing #14445

Merged

3 tasks

rapids-bot bot closed this as completed in #14445 Nov 18, 2023

This was referenced Nov 21, 2023

Introduce Comprehensive Pathological Unit Tests for Issue #14409 #14459

Merged

TEST Introduce Comprehensive Pathological Unit Tests for Issue #14409 #14467

Closed

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Inconsistency in cudfAST Evaluation for Complex Expressions #14409

[BUG] Inconsistency in cudfAST Evaluation for Complex Expressions #14409

aocsa commented Nov 14, 2023 •

edited

Loading

vyasr commented Nov 14, 2023 •

edited

Loading

GregoryKimball commented Nov 15, 2023

revans2 commented Nov 15, 2023

jlowe commented Nov 15, 2023

vyasr commented Nov 17, 2023

[BUG] Inconsistency in cudfAST Evaluation for Complex Expressions #14409

[BUG] Inconsistency in cudfAST Evaluation for Complex Expressions #14409

Comments

aocsa commented Nov 14, 2023 • edited Loading

Describe the bug

Observed Behavior

Clarification Needed

Expected Behavior

Steps to Reproduce

Steps/Code to Reproduce Bug

Environment overview

vyasr commented Nov 14, 2023 • edited Loading

GregoryKimball commented Nov 15, 2023

revans2 commented Nov 15, 2023

jlowe commented Nov 15, 2023

vyasr commented Nov 17, 2023

aocsa commented Nov 14, 2023 •

edited

Loading

vyasr commented Nov 14, 2023 •

edited

Loading