Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the interval division to throw same type exceptions as Spark #6019

Conversation

HaoYang670
Copy link
Collaborator

@HaoYang670 HaoYang670 commented Jul 19, 2022

Signed-off-by: remzi [email protected]
This is a subtask of #5196.

Rationale of this PR

We should throw the same type exceptions as Spark when doing interval division.

Changes in this PR

  1. Check the divByZero before we do interval division, which includes:
  • column / column
  • column / scalar
  • scalar / column.
  1. Use the overflowInIntegralDivideError in Spark instead of implementing our own one.
  2. Split the test cases of test_day_time_interval_division_overflow based on the kind of errors they throw.
  3. Add more tests to cover interval division with scalars
  4. Update some other exceptions to use the RapidsErrorUtils as much as possible.

@HaoYang670
Copy link
Collaborator Author

build

@@ -523,6 +524,11 @@ case class GpuDivideDTInterval(
}

override def doColumnar(interval: GpuColumnVector, num: GpuColumnVector): ColumnVector = {
withResource(makeZeroScalar(num.getBase.getType)) { zeroScalar =>
if (num.getBase.contains(zeroScalar)) {
throw RapidsErrorUtils.divByZeroError(origin)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a new error SparkArithmeticException, it's better to remove the original error ArithmeticException.
Can we just update the original check?

Original check

    if (IntervalUtils.hasZero(q)) {
      throw new ArithmeticException("overflow: interval / zero")
    }

Original error:

java.lang.ArithmeticException: overflow: interval / zero
	at org.apache.spark.sql.rapids.shims.IntervalUtils$.divWithHalfUpModeWithOverflowCheck(intervalExpressions.scala:246)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also check another path, the numScalar can be zero.

override def doColumnar(interval: GpuColumnVector, numScalar: GpuScalar)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also check another path, the numScalar can be zero.

override def doColumnar(interval: GpuColumnVector, numScalar: GpuScalar)

Updated!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a new error SparkArithmeticException, it's better to remove the original error ArithmeticException. Can we just update the original check?

Original check

    if (IntervalUtils.hasZero(q)) {
      throw new ArithmeticException("overflow: interval / zero")
    }

Original error:

java.lang.ArithmeticException: overflow: interval / zero
	at org.apache.spark.sql.rapids.shims.IntervalUtils$.divWithHalfUpModeWithOverflowCheck(intervalExpressions.scala:246)

Updated!.

BTW, I don't update this error:

  def divWithHalfUpModeWithOverflowCheck(p: BinaryOperable, q: BinaryOperable): ColumnVector = {
    // 1. overflow check q is 0
    if (IntervalUtils.hasZero(q)) {
      throw new ArithmeticException("overflow: interval / zero")
    }
    ...

because we don't have the value origin: Origin in this context

revans2
revans2 previously approved these changes Jul 19, 2022
@HaoYang670
Copy link
Collaborator Author

I guess we don't need to check the divide by zero error in the 2 scalars case:

  override def doColumnar(numRows: Int, intervalScalar: GpuScalar,
      numScalar: GpuScalar): ColumnVector = {
    withResource(GpuColumnVector.from(intervalScalar, numRows, interval.dataType)) { expandedLhs =>
      doColumnar(expandedLhs, numScalar)
    }
  }

Because the expression is checked by Spark:
(branch 22.08 spark 330)

scala> df.selectExpr("INTERVAL 1 SECOND / 0.0f").collect
org.apache.spark.SparkArithmeticException: Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" (except for ANSI interval type) to bypass this error.
== SQL(line 1, position 1) ==
INTERVAL 1 SECOND / 0.0f
^^^^^^^^^^^^^^^^^^^^^^^^

  at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:184)
  at org.apache.spark.sql.catalyst.expressions.IntervalDivide.divideByZeroCheck(intervalExpressions.scala:617)
  at org.apache.spark.sql.catalyst.expressions.IntervalDivide.divideByZeroCheck$(intervalExpressions.scala:614)
  at org.apache.spark.sql.catalyst.expressions.DivideDTInterval.divideByZeroCheck(intervalExpressions.scala:713)
  at org.apache.spark.sql.catalyst.expressions.DivideDTInterval.nullSafeEval(intervalExpressions.scala:737)
  at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:642)
  at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:158)
  ...

@HaoYang670
Copy link
Collaborator Author

build

@HaoYang670 HaoYang670 changed the title Update the test_day_time_interval_division_overflow Update the interval division to throw same type exceptions as Spark Jul 20, 2022
@HaoYang670
Copy link
Collaborator Author

build

@HaoYang670
Copy link
Collaborator Author

build

@HaoYang670 HaoYang670 self-assigned this Jul 20, 2022
@HaoYang670
Copy link
Collaborator Author

build

@revans2
Copy link
Collaborator

revans2 commented Jul 20, 2022

build

@revans2
Copy link
Collaborator

revans2 commented Jul 20, 2022

Looks like there was a network glitch in the previous build

@sameerz sameerz added the task Work required that improves the product but is not user facing label Jul 20, 2022
@res-life
Copy link
Collaborator

LGTM, but it's better to test all the interval cases related to this change on all the Spark versions before merging this PR.

@firestarman
Copy link
Collaborator

LGTM

@HaoYang670 HaoYang670 merged commit 071cb2f into NVIDIA:branch-22.08 Jul 21, 2022
@HaoYang670 HaoYang670 deleted the update_test_day_time_interval_division_overflow branch July 21, 2022 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants