Update the interval division to throw same type exceptions as Spark #6019

HaoYang670 · 2022-07-19T05:21:20Z

Signed-off-by: remzi [email protected]
This is a subtask of #5196.

Rationale of this PR

We should throw the same type exceptions as Spark when doing interval division.

Changes in this PR

Check the divByZero before we do interval division, which includes:

column / column
column / scalar
scalar / column.

Use the overflowInIntegralDivideError in Spark instead of implementing our own one.
Split the test cases of test_day_time_interval_division_overflow based on the kind of errors they throw.
Add more tests to cover interval division with scalars
Update some other exceptions to use the RapidsErrorUtils as much as possible.

Signed-off-by: remzi <[email protected]>

…on_overflow

Signed-off-by: remzi <[email protected]>

HaoYang670 · 2022-07-19T05:28:20Z

build

sql-plugin/src/main/330+/scala/org/apache/spark/sql/rapids/shims/intervalExpressions.scala

sql-plugin/src/main/330+/scala/org/apache/spark/sql/rapids/shims/RapidsErrorUtils.scala

res-life · 2022-07-19T08:07:57Z

sql-plugin/src/main/330+/scala/org/apache/spark/sql/rapids/shims/intervalExpressions.scala

@@ -523,6 +524,11 @@ case class GpuDivideDTInterval(
  }

  override def doColumnar(interval: GpuColumnVector, num: GpuColumnVector): ColumnVector = {
+    withResource(makeZeroScalar(num.getBase.getType)) { zeroScalar =>
+      if (num.getBase.contains(zeroScalar)) {
+        throw RapidsErrorUtils.divByZeroError(origin)


If we use a new error SparkArithmeticException, it's better to remove the original error ArithmeticException.
Can we just update the original check?

Original check

if (IntervalUtils.hasZero(q)) { throw new ArithmeticException("overflow: interval / zero") }

Original error:

java.lang.ArithmeticException: overflow: interval / zero at org.apache.spark.sql.rapids.shims.IntervalUtils$.divWithHalfUpModeWithOverflowCheck(intervalExpressions.scala:246)

Should also check another path, the numScalar can be zero.

override def doColumnar(interval: GpuColumnVector, numScalar: GpuScalar)

Should also check another path, the numScalar can be zero.

override def doColumnar(interval: GpuColumnVector, numScalar: GpuScalar)

Updated!

If we use a new error SparkArithmeticException, it's better to remove the original error ArithmeticException. Can we just update the original check?

Original check

if (IntervalUtils.hasZero(q)) { throw new ArithmeticException("overflow: interval / zero") }

Original error:

java.lang.ArithmeticException: overflow: interval / zero at org.apache.spark.sql.rapids.shims.IntervalUtils$.divWithHalfUpModeWithOverflowCheck(intervalExpressions.scala:246)

Updated!.

BTW, I don't update this error:

def divWithHalfUpModeWithOverflowCheck(p: BinaryOperable, q: BinaryOperable): ColumnVector = { // 1. overflow check q is 0 if (IntervalUtils.hasZero(q)) { throw new ArithmeticException("overflow: interval / zero") } ...

because we don't have the value origin: Origin in this context

Signed-off-by: remzi <[email protected]>

HaoYang670 · 2022-07-20T03:26:06Z

I guess we don't need to check the divide by zero error in the 2 scalars case:

  override def doColumnar(numRows: Int, intervalScalar: GpuScalar,
      numScalar: GpuScalar): ColumnVector = {
    withResource(GpuColumnVector.from(intervalScalar, numRows, interval.dataType)) { expandedLhs =>
      doColumnar(expandedLhs, numScalar)
    }
  }

Because the expression is checked by Spark:
(branch 22.08 spark 330)

scala> df.selectExpr("INTERVAL 1 SECOND / 0.0f").collect
org.apache.spark.SparkArithmeticException: Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" (except for ANSI interval type) to bypass this error.
== SQL(line 1, position 1) ==
INTERVAL 1 SECOND / 0.0f
^^^^^^^^^^^^^^^^^^^^^^^^

  at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:184)
  at org.apache.spark.sql.catalyst.expressions.IntervalDivide.divideByZeroCheck(intervalExpressions.scala:617)
  at org.apache.spark.sql.catalyst.expressions.IntervalDivide.divideByZeroCheck$(intervalExpressions.scala:614)
  at org.apache.spark.sql.catalyst.expressions.DivideDTInterval.divideByZeroCheck(intervalExpressions.scala:713)
  at org.apache.spark.sql.catalyst.expressions.DivideDTInterval.nullSafeEval(intervalExpressions.scala:737)
  at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:642)
  at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:158)
  ...

Signed-off-by: remzi <[email protected]>

HaoYang670 · 2022-07-20T04:41:02Z

build

HaoYang670 · 2022-07-20T05:23:01Z

build

Signed-off-by: remzi <[email protected]>

HaoYang670 · 2022-07-20T06:48:09Z

build

Signed-off-by: remzi <[email protected]>

HaoYang670 · 2022-07-20T10:58:27Z

build

revans2 · 2022-07-20T13:24:56Z

build

revans2 · 2022-07-20T13:25:29Z

Looks like there was a network glitch in the previous build

res-life · 2022-07-21T01:20:44Z

LGTM, but it's better to test all the interval cases related to this change on all the Spark versions before merging this PR.

firestarman · 2022-07-21T01:40:03Z

LGTM

HaoYang670 added 6 commits July 19, 2022 12:23

split test for NaN

ac56459

Signed-off-by: remzi <[email protected]>

split test for divide by zero

703f8bd

Signed-off-by: remzi <[email protected]>

temp save

ded3550

Signed-off-by: remzi <[email protected]>

Merge branch 'branch-22.08' into update_test_day_time_interval_divisi…

06c1374

…on_overflow

split tests

759aa31

Signed-off-by: remzi <[email protected]>

fix lint

5a322b2

Signed-off-by: remzi <[email protected]>

HaoYang670 mentioned this pull request Jul 19, 2022

[FEA] Plugin should throw same arithmetic exceptions as Spark #5196

Closed

14 tasks

firestarman reviewed Jul 19, 2022

View reviewed changes

sql-plugin/src/main/330+/scala/org/apache/spark/sql/rapids/shims/intervalExpressions.scala Show resolved Hide resolved

sql-plugin/src/main/330+/scala/org/apache/spark/sql/rapids/shims/RapidsErrorUtils.scala Show resolved Hide resolved

res-life reviewed Jul 19, 2022

View reviewed changes

revans2 previously approved these changes Jul 19, 2022

View reviewed changes

add error utils for 3.1 and 3.2

a9b052b

Signed-off-by: remzi <[email protected]>

HaoYang670 dismissed revans2’s stale review via a9b052b July 20, 2022 01:30

HaoYang670 added 2 commits July 20, 2022 10:46

update the case of divided by zero scalar

dda1560

Signed-off-by: remzi <[email protected]>

update other 2 doColumnar functions and add tests

2df9453

Signed-off-by: remzi <[email protected]>

update overflow errors

3e4c200

Signed-off-by: remzi <[email protected]>

HaoYang670 requested review from firestarman, revans2 and res-life July 20, 2022 04:44

HaoYang670 changed the title ~~Update the test_day_time_interval_division_overflow~~ Update the interval division to throw same type exceptions as Spark Jul 20, 2022

remove redundant test cases

2316617

Signed-off-by: remzi <[email protected]>

HaoYang670 self-assigned this Jul 20, 2022

skip the test if before version 330

f64f41b

Signed-off-by: remzi <[email protected]>

revans2 approved these changes Jul 20, 2022

View reviewed changes

sameerz added the task Work required that improves the product but is not user facing label Jul 20, 2022

res-life approved these changes Jul 21, 2022

View reviewed changes

HaoYang670 merged commit 071cb2f into NVIDIA:branch-22.08 Jul 21, 2022

HaoYang670 deleted the update_test_day_time_interval_division_overflow branch July 21, 2022 03:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the interval division to throw same type exceptions as Spark #6019

Update the interval division to throw same type exceptions as Spark #6019

HaoYang670 commented Jul 19, 2022 •

edited

Loading

HaoYang670 commented Jul 19, 2022

res-life Jul 19, 2022

res-life Jul 19, 2022

HaoYang670 Jul 20, 2022

HaoYang670 Jul 20, 2022

HaoYang670 commented Jul 20, 2022

HaoYang670 commented Jul 20, 2022

HaoYang670 commented Jul 20, 2022

HaoYang670 commented Jul 20, 2022

HaoYang670 commented Jul 20, 2022

revans2 commented Jul 20, 2022

revans2 commented Jul 20, 2022

res-life commented Jul 21, 2022

firestarman commented Jul 21, 2022

Update the interval division to throw same type exceptions as Spark #6019

Update the interval division to throw same type exceptions as Spark #6019

Conversation

HaoYang670 commented Jul 19, 2022 • edited Loading

Rationale of this PR

Changes in this PR

HaoYang670 commented Jul 19, 2022

res-life Jul 19, 2022

Choose a reason for hiding this comment

res-life Jul 19, 2022

Choose a reason for hiding this comment

HaoYang670 Jul 20, 2022

Choose a reason for hiding this comment

HaoYang670 Jul 20, 2022

Choose a reason for hiding this comment

HaoYang670 commented Jul 20, 2022

HaoYang670 commented Jul 20, 2022

HaoYang670 commented Jul 20, 2022

HaoYang670 commented Jul 20, 2022

HaoYang670 commented Jul 20, 2022

revans2 commented Jul 20, 2022

revans2 commented Jul 20, 2022

res-life commented Jul 21, 2022

firestarman commented Jul 21, 2022

HaoYang670 commented Jul 19, 2022 •

edited

Loading