Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short circuit AND/OR in ANSI mode #4760

Merged
merged 16 commits into from
Feb 22, 2022

Conversation

amahussein
Copy link
Collaborator

fixes #4526

Compatibility:

  • For AND if the RHS has side effects we only process it for rows where the LHS is not false (this includes nulls).
  • For OR if the RHS has side effects we only process it for rows where the LHS is not true (this includes nulls).

Testing:

  • Added two integration tests in logic_test.py

Code changes:

  • Override the implementation of columnarEval for both GpuAnd and GpuOR
  • The above change requires some refactoring to ConditionalExpressions since the new code needs to use some of the methods defined there such as: gather, filterBatch,

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
@sameerz sameerz added the bug Something isn't working label Feb 11, 2022
@sameerz sameerz added this to the Feb 14 - Feb 25 milestone Feb 11, 2022
@amahussein amahussein marked this pull request as draft February 11, 2022 18:42
@pytest.mark.parametrize('expr', _get_arithmetic_overflow_expr('OR'))
def test_or_with_side_effect(expr, ansi_enabled, lhs_predicate):
ansi_conf = {'spark.sql.ansi.enabled': ansi_enabled}
if ansi_enabled == 'true' and (not(lhs_predicate)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ansi_enabled == 'true' and (not(lhs_predicate)):
if ansi_enabled == 'true' and not lhs_predicate:

@@ -66,6 +67,125 @@ case class GpuAnd(left: Expression, right: Expression) extends CudfBinaryOperato

override def binaryOp: BinaryOp = BinaryOp.NULL_LOGICAL_AND
override def astOperator: Option[BinaryOperator] = Some(ast.BinaryOperator.NULL_LOGICAL_AND)

protected def filterBatch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than copying all this code, I expected GpuAnd to use the existing GpuConditionalExpression trait.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes absolutely agree.
I was thinking to create a new trait for helpers related to the ColumnVector since those methods are not really specific to "ConditionalExpression". WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, although it starts to get outside the scope of the PR a bit. If we are moving stuff around, filterBatch should be part of GpuFilter which does something almost identical to this method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @jlowe said, adding with GpuConditionalExpression will bring the needed methods in and avoid the need to duplicate code. Perhaps a better name for this trait now is GpuConditionalEvaluation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not add GpuConditionalExpression out of the box because of conflicts between the traits tree. So, I pulled the generic methods into a new trait that can be extended by the GpuAnd/GpuOr.
Another approach would be to move that to Helper object, but I did not like that because it would cause more scattered changes in conditionalExpressions.scala

@pytest.mark.parametrize('ansi_enabled', ['false', 'true'])
@pytest.mark.parametrize('lhs_predicate', [False, True])
@pytest.mark.parametrize('expr', _get_arithmetic_overflow_expr('OR'))
def test_or_with_side_effect(expr, ansi_enabled, lhs_predicate):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the difference in tests is rather small and you could combine them parametrized 'AND', 'OR'. Then you would not need _get_arithmetic_overflow_expr / _get_arithmetic_overflow_expr to be global

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!
Initially, when I looked at the existing python tests, I thought that there is preference to keep separate tests for readability.

# process the RHS if they cannot figure out the result from just the LHS.
# Tests the GPU short-circuits the predicates without throwing Exception in ANSI mode.
@pytest.mark.parametrize('ansi_enabled', ['true', 'false'])
@pytest.mark.parametrize('lhs_predicate', [True, False])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a Null lhs predicate for completeness?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Suggestion! Done!

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
}
} else {
// replace nulll values
val lhsNoNulls = withResource(Scalar.fromBool(true)) { trueScalar =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive is this replacement? and should we be checking if there are any nulls first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked up all references to columnVector.replaceNulls() and I did not see it guarded by check for nulls. So, I decided to follow the same pattern thinking that replaceNulls() is not expensive.
I fixed that in the recent commits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly wanted some kind of evidence if it is slow or not. I think in this case my gut is probably okay, but it might be nice to have something better than that. Do you have any benchmarks that we could test this with? I know that AND/OR ended up being very slow int he past and caused measurable degradation in some TPC-DS queries. So it should be possible to have a project with lots of ANDS/ORs in it and see.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping you never say that :-)
I will take a look into benchmarking. It may take me some time to get back with meaningful data though.

@sameerz sameerz changed the title [WIP] Short cuircit AND/OR in ANSI mode [WIP] Short circuit AND/OR in ANSI mode Feb 16, 2022
Comment on lines 161 to 162
case (f: GpuScalar) =>
doColumnar(lhsBool, f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path is not covered by the current tests. Would we ever hit this path? I don't think that scalar expressions can have side-effects and we would only call this method if the RHS has side-effects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that is an interesting situation. The only place right now where we can return a Scalar from processing when the input is not a foldable constant is some corner cases with Coalesce, and Coalesce can totally have side effects associated with it for down stream tasks that are a part of it. Perhaps we should try to write a test for this just to be sure...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to write a test that hits the scalar path and have failed to come up with anything so far. It looks like Coalesce will only return a scalar if the first non-null parameter is a scalar, but Spark will optimize the Coalesce out in that case. I am back to wondering if the scalar path is even possible, but maybe I am missing the edge case here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Spark had an optimization in older versions, so you might try that. Also this is enough of a corner case that I am fine if it as is.

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
@amahussein amahussein changed the title [WIP] Short circuit AND/OR in ANSI mode Short circuit AND/OR in ANSI mode Feb 17, 2022
@amahussein amahussein marked this pull request as ready for review February 17, 2022 16:03
@amahussein
Copy link
Collaborator Author

build


trait GpuColumnVectorHelper extends Arm {
Copy link
Collaborator

@revans2 revans2 Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we find a better name for this? It was GpuConditionalExpression so it was clear that it was supported to be for use with those, but now it is a generic utility library with an even more generic name and no clear contract on how these APIs should be used. Could we change this to be an object instead of a trait? Could we restore the name of GpuConditionalExpression so it is still clear that they are intended to be used with these types of classes? Also I really dislike that we are overriding isAllTrue that just does not appear to make any since to me. If we have different ways of calculating isAllTrue then we should either have separate utility APIs with a name that makes it clear how each works, or we should have a flag passed to the API to allow you to select how it works. Does that make since to you?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.
I initially thought to use the existing Object GpuExpressionsUtils. Should that be fine? Or do you prefer to create a new Object helper , and what would be its name?

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
revans2
revans2 previously approved these changes Feb 18, 2022
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I have a few nits that would be nice to address, but nothing that I think is required.

* side-effects, such as throwing exceptions for invalid inputs.
*
* This method performs lazy evaluation on the GPU by first filtering
* the input batch a where the LHS predicate is True.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line feels like it needs to be looked at again. the "a " in between batch and where feels off.

doColumnar(lhsBool, GpuColumnVector.from(combinedVector, dataType))
}
case f: GpuScalar =>
doColumnar(lhsBool, f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like was stated before this code is not covered by any test. I have my doubts that it can be covered, but I think the simplest solution to this is to call columnarEvalToColumn which will return the scalar as a column and not worry about it.

withResource(GpuExpressionsUtils.columnarEvalToColumn(rightExpr, leftTrueBatch)) { rEval =>
    withResource(gather(lhsNoNulls, rEval)) { combinedVector =>
        doColumnar(lhsBool, GpuColumnVector.from(combinedVector, dataType))
    }
}

It also makes the code much smaller with the expense of extra memory in a case we don't know if it is possible to hit.

@amahussein
Copy link
Collaborator Author

build

@amahussein
Copy link
Collaborator Author

This looks good. I have a few nits that would be nice to address, but nothing that I think is required.

Thanks @revans2 !
I pushed one final commit to address the remaining comments.
Does it look good to you?

@revans2 revans2 merged commit 076f36e into NVIDIA:branch-22.04 Feb 22, 2022
@amahussein amahussein deleted the rapids-4526-impl branch February 22, 2022 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Short circuit AND/OR in ANSI mode
6 participants