-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Short circuit AND/OR in ANSI mode #4760
Conversation
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
@pytest.mark.parametrize('expr', _get_arithmetic_overflow_expr('OR')) | ||
def test_or_with_side_effect(expr, ansi_enabled, lhs_predicate): | ||
ansi_conf = {'spark.sql.ansi.enabled': ansi_enabled} | ||
if ansi_enabled == 'true' and (not(lhs_predicate)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ansi_enabled == 'true' and (not(lhs_predicate)): | |
if ansi_enabled == 'true' and not lhs_predicate: |
@@ -66,6 +67,125 @@ case class GpuAnd(left: Expression, right: Expression) extends CudfBinaryOperato | |||
|
|||
override def binaryOp: BinaryOp = BinaryOp.NULL_LOGICAL_AND | |||
override def astOperator: Option[BinaryOperator] = Some(ast.BinaryOperator.NULL_LOGICAL_AND) | |||
|
|||
protected def filterBatch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than copying all this code, I expected GpuAnd
to use the existing GpuConditionalExpression
trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes absolutely agree.
I was thinking to create a new trait for helpers related to the ColumnVector since those methods are not really specific to "ConditionalExpression". WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, although it starts to get outside the scope of the PR a bit. If we are moving stuff around, filterBatch
should be part of GpuFilter
which does something almost identical to this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @jlowe said, adding with GpuConditionalExpression
will bring the needed methods in and avoid the need to duplicate code. Perhaps a better name for this trait now is GpuConditionalEvaluation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not add GpuConditionalExpression
out of the box because of conflicts between the traits tree. So, I pulled the generic methods into a new trait that can be extended by the GpuAnd/GpuOr.
Another approach would be to move that to Helper object, but I did not like that because it would cause more scattered changes in conditionalExpressions.scala
@pytest.mark.parametrize('ansi_enabled', ['false', 'true']) | ||
@pytest.mark.parametrize('lhs_predicate', [False, True]) | ||
@pytest.mark.parametrize('expr', _get_arithmetic_overflow_expr('OR')) | ||
def test_or_with_side_effect(expr, ansi_enabled, lhs_predicate): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the difference in tests is rather small and you could combine them parametrized 'AND', 'OR'. Then you would not need _get_arithmetic_overflow_expr / _get_arithmetic_overflow_expr to be global
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Initially, when I looked at the existing python tests, I thought that there is preference to keep separate tests for readability.
# process the RHS if they cannot figure out the result from just the LHS. | ||
# Tests the GPU short-circuits the predicates without throwing Exception in ANSI mode. | ||
@pytest.mark.parametrize('ansi_enabled', ['true', 'false']) | ||
@pytest.mark.parametrize('lhs_predicate', [True, False]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a Null lhs predicate for completeness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Suggestion! Done!
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/predicates.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/predicates.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/predicates.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/predicates.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/predicates.scala
Outdated
Show resolved
Hide resolved
} | ||
} else { | ||
// replace nulll values | ||
val lhsNoNulls = withResource(Scalar.fromBool(true)) { trueScalar => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How expensive is this replacement? and should we be checking if there are any nulls first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked up all references to columnVector.replaceNulls()
and I did not see it guarded by check for nulls. So, I decided to follow the same pattern thinking that replaceNulls()
is not expensive.
I fixed that in the recent commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly wanted some kind of evidence if it is slow or not. I think in this case my gut is probably okay, but it might be nice to have something better than that. Do you have any benchmarks that we could test this with? I know that AND/OR ended up being very slow int he past and caused measurable degradation in some TPC-DS queries. So it should be possible to have a project with lots of ANDS/ORs in it and see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping you never say that :-)
I will take a look into benchmarking. It may take me some time to get back with meaningful data though.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/predicates.scala
Outdated
Show resolved
Hide resolved
case (f: GpuScalar) => | ||
doColumnar(lhsBool, f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path is not covered by the current tests. Would we ever hit this path? I don't think that scalar expressions can have side-effects and we would only call this method if the RHS has side-effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that is an interesting situation. The only place right now where we can return a Scalar from processing when the input is not a foldable constant is some corner cases with Coalesce, and Coalesce can totally have side effects associated with it for down stream tasks that are a part of it. Perhaps we should try to write a test for this just to be sure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to write a test that hits the scalar path and have failed to come up with anything so far. It looks like Coalesce will only return a scalar if the first non-null parameter is a scalar, but Spark will optimize the Coalesce out in that case. I am back to wondering if the scalar path is even possible, but maybe I am missing the edge case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Spark had an optimization in older versions, so you might try that. Also this is enough of a corner case that I am fine if it as is.
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/predicates.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/predicates.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
build |
|
||
trait GpuColumnVectorHelper extends Arm { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we find a better name for this? It was GpuConditionalExpression
so it was clear that it was supported to be for use with those, but now it is a generic utility library with an even more generic name and no clear contract on how these APIs should be used. Could we change this to be an object instead of a trait? Could we restore the name of GpuConditionalExpression
so it is still clear that they are intended to be used with these types of classes? Also I really dislike that we are overriding isAllTrue
that just does not appear to make any since to me. If we have different ways of calculating isAllTrue
then we should either have separate utility APIs with a name that makes it clear how each works, or we should have a flag passed to the API to allow you to select how it works. Does that make since to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense.
I initially thought to use the existing Object GpuExpressionsUtils
. Should that be fine? Or do you prefer to create a new Object helper , and what would be its name?
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I have a few nits that would be nice to address, but nothing that I think is required.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/conditionalExpressions.scala
Show resolved
Hide resolved
* side-effects, such as throwing exceptions for invalid inputs. | ||
* | ||
* This method performs lazy evaluation on the GPU by first filtering | ||
* the input batch a where the LHS predicate is True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line feels like it needs to be looked at again. the "a " in between batch and where feels off.
doColumnar(lhsBool, GpuColumnVector.from(combinedVector, dataType)) | ||
} | ||
case f: GpuScalar => | ||
doColumnar(lhsBool, f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like was stated before this code is not covered by any test. I have my doubts that it can be covered, but I think the simplest solution to this is to call columnarEvalToColumn which will return the scalar as a column and not worry about it.
withResource(GpuExpressionsUtils.columnarEvalToColumn(rightExpr, leftTrueBatch)) { rEval =>
withResource(gather(lhsNoNulls, rEval)) { combinedVector =>
doColumnar(lhsBool, GpuColumnVector.from(combinedVector, dataType))
}
}
It also makes the code much smaller with the expense of extra memory in a case we don't know if it is possible to hit.
build |
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Thanks @revans2 ! |
fixes #4526
Compatibility:
Testing:
Code changes:
columnarEval
for bothGpuAnd
andGpuOR
ConditionalExpressions
since the new code needs to use some of the methods defined there such as:gather
,filterBatch
,