Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Inconsistency in cudfAST Evaluation for Complex Expressions #14409

Closed
aocsa opened this issue Nov 14, 2023 · 5 comments · Fixed by #14445
Closed

[BUG] Inconsistency in cudfAST Evaluation for Complex Expressions #14409

aocsa opened this issue Nov 14, 2023 · 5 comments · Fixed by #14445
Assignees
Labels
bug Something isn't working Spark Functionality that helps Spark RAPIDS

Comments

@aocsa
Copy link
Contributor

aocsa commented Nov 14, 2023

Describe the bug

cudfAST, as documented in Issue #12319, has known limitations in evaluating AST expressions involving mixed data types such as i32 + i64. This limitation stems from the lack of automatic type casting support in cudfAST. The source code reference here further explains this behavior. It appears that cudfAST deems an expression valid when operands share the same data type.

Observed Behavior

cudfAST successfully evaluates expressions where binary operands have consistent data types, and the AST tree's height is less than three. An example is (i64 < i64) == (f64 < f64). However, in attempts to evaluate more complex expressions with additional levels in the AST Tree, such as (i64 < i64) == (i64 < (i64 + i64)), cudfAST returns an error stating, "An AST expression was provided non-matching operand types."

Example of valid cudfAST expression

                (==)
               /    \
            (<)      (<)
           /  \     /   \
        i64  i64  str   str

Clarification Needed

It remains unclear whether this behavior is an intentional design choice or a limitation that warrants attention. The inability to process expressions with an AST tree height greater than two significantly restricts cudfAST's flexibility and usability.

Here is an example of a 'complex' expression.

                (==)
               /    \
            (<)      (<)
           /  \     /   \
        i64  i64  i64   (+)
                        /  \
                      i64  i64

Expected Behavior

Ideally, cudfAST should either support evaluation of complex expression or provide a clear error message explaining the limitation.

Steps to Reproduce

Execute a simple cudfAST expression like (i64 < i64) == (i64 < i64), which works as expected.
Attempt a more complex expression such as (i64 < i64) == (i64 < (i64 + i64)), which results in the mentioned error.

Steps/Code to Reproduce Bug

import cudf
import pandas as pd 

df = cudf.DataFrame({'b8': [True,True,False], 'i64': [4, 5, 6], 'f64': [4.0, 5.0, 6.0], 'str': ["A", "B", "C"]})
df.eval("(b8 < b8) == (str < str)") # OK
df.eval("(i64 < i64) == (str == str)") # OK 

df.eval("(i64 < i64) == (i64 < (i64 + i64))")
# Error received:
# CUDF failure at:/opt/conda/conda-bld/work/cpp/src/ast/expression_parser.cpp:149:
# An AST expression was provided non-matching operand types.

Environment overview

Environment location: conda
Method of cuDF install: conda
branch-23.10, origin/branch-23.10

@aocsa aocsa added Needs Triage Need team to review and classify bug Something isn't working labels Nov 14, 2023
@vyasr
Copy link
Contributor

vyasr commented Nov 14, 2023

Thank you for the report! This definitely looks like a bug to me. I'm not immediately sure why this would be happening, but it looks like the output type of the more deeply nested part of the expression tree is being determined incorrectly. This should be supported.

It definitely looks like the issue is with imbalanced trees because this expression works:

 df.eval("(i64 > (i64 + i64)) == (i64 < (i64 + i64))")

@vyasr vyasr self-assigned this Nov 15, 2023
@GregoryKimball
Copy link
Contributor

@revans2 Have you seen any issues with inbalanced AST trees in Spark-RAPIDS?

@revans2
Copy link
Contributor

revans2 commented Nov 15, 2023

@revans2 Have you seen any issues with inbalanced AST trees in Spark-RAPIDS?

@GregoryKimball we have not stress tested that part much, but obviously should do more of it. We have not seen this in practice, but most of the join conditions we see end up being split up and optimized by Spark anyways. So it might take a fair amount of work to make an imbalanced condition like this.

@jlowe
Copy link
Member

jlowe commented Nov 15, 2023

I was able to reproduce the issue on Spark with the RAPIDS Accelerator by enabling AST projections and using a complex expression similar to the one proposed (the original expression Spark will simplify into one that works):

scala> spark.conf.set("spark.rapids.sql.projectAstEnabled", "true")
scala> spark.range(10).withColumn("x", col("id")).repartition(1).selectExpr("(id < x) == (id < (id + x))").collect
23/11/15 21:30:31 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 3)
ai.rapids.cudf.CudfException: CUDF failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-pre_release-203-cuda11/thirdparty/cudf/cpp/src/ast/expression_parser.cpp:149: An AST expression was provided non-matching operand types.
	at ai.rapids.cudf.ast.CompiledExpression.computeColumn(Native Method)
	at ai.rapids.cudf.ast.CompiledExpression.computeColumn(CompiledExpression.java:88)
	at com.nvidia.spark.rapids.GpuProjectAstExec$$anon$1.$anonfun$next$15(basicPhysicalOperators.scala:452)
[...]

@vyasr
Copy link
Contributor

vyasr commented Nov 17, 2023

I believe I have a fix for this in #14445.

rapids-bot bot pushed a commit that referenced this issue Nov 18, 2023
When parsing expressions, device data references are reused if there are multiple that are identical. Equality is determined by comparing the fields of the reference, but previously the data type was omitted. For column and literal references, this is OK because the `data_index` uniquely identifies the reference. For intermediates, however, the index is not sufficient to disambiguate because an expression could reuse a given location even if the operation produces a different data type. Therefore, the data type must be part of the equality operator.

Resolves #14409

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Bradley Dice (https://github.com/bdice)

URL: #14445
rapids-bot bot pushed a commit that referenced this issue Nov 24, 2023
)

This PR addresses the issue at #14409. I would like to propose the addition of unit tests that involve scenarios like having 100 or 1000 elements in a tree, reaching 100 levels of depth, with diferent data types and similar stress tests. The purpose of these tests is to conduct comprehensive testing and stress the Abstract Syntax Tree (AST), ultimately aiding in the identification and resolution of any potential issues.

By introducing these pathological tests, we aim to ensure the robustness and reliability of our codebase. These tests can help us uncover edge cases and performance bottlenecks that might otherwise go unnoticed.

Authors:
  - Alexander Ocsa (https://github.com/aocsa)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)

URL: #14459
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants