-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the interval division to throw same type exceptions as Spark #6019
Update the interval division to throw same type exceptions as Spark #6019
Conversation
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
build |
sql-plugin/src/main/330+/scala/org/apache/spark/sql/rapids/shims/intervalExpressions.scala
Show resolved
Hide resolved
sql-plugin/src/main/330+/scala/org/apache/spark/sql/rapids/shims/RapidsErrorUtils.scala
Show resolved
Hide resolved
@@ -523,6 +524,11 @@ case class GpuDivideDTInterval( | |||
} | |||
|
|||
override def doColumnar(interval: GpuColumnVector, num: GpuColumnVector): ColumnVector = { | |||
withResource(makeZeroScalar(num.getBase.getType)) { zeroScalar => | |||
if (num.getBase.contains(zeroScalar)) { | |||
throw RapidsErrorUtils.divByZeroError(origin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use a new error SparkArithmeticException
, it's better to remove the original error ArithmeticException
.
Can we just update the original check?
if (IntervalUtils.hasZero(q)) {
throw new ArithmeticException("overflow: interval / zero")
}
Original error:
java.lang.ArithmeticException: overflow: interval / zero
at org.apache.spark.sql.rapids.shims.IntervalUtils$.divWithHalfUpModeWithOverflowCheck(intervalExpressions.scala:246)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also check another path, the numScalar
can be zero.
override def doColumnar(interval: GpuColumnVector, numScalar: GpuScalar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also check another path, the
numScalar
can be zero.override def doColumnar(interval: GpuColumnVector, numScalar: GpuScalar)
Updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use a new error
SparkArithmeticException
, it's better to remove the original errorArithmeticException
. Can we just update the original check?if (IntervalUtils.hasZero(q)) { throw new ArithmeticException("overflow: interval / zero") }
Original error:
java.lang.ArithmeticException: overflow: interval / zero at org.apache.spark.sql.rapids.shims.IntervalUtils$.divWithHalfUpModeWithOverflowCheck(intervalExpressions.scala:246)
Updated!.
BTW, I don't update this error:
def divWithHalfUpModeWithOverflowCheck(p: BinaryOperable, q: BinaryOperable): ColumnVector = {
// 1. overflow check q is 0
if (IntervalUtils.hasZero(q)) {
throw new ArithmeticException("overflow: interval / zero")
}
...
because we don't have the value origin: Origin
in this context
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
I guess we don't need to check the override def doColumnar(numRows: Int, intervalScalar: GpuScalar,
numScalar: GpuScalar): ColumnVector = {
withResource(GpuColumnVector.from(intervalScalar, numRows, interval.dataType)) { expandedLhs =>
doColumnar(expandedLhs, numScalar)
}
} Because the expression is checked by Spark: scala> df.selectExpr("INTERVAL 1 SECOND / 0.0f").collect
org.apache.spark.SparkArithmeticException: Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" (except for ANSI interval type) to bypass this error.
== SQL(line 1, position 1) ==
INTERVAL 1 SECOND / 0.0f
^^^^^^^^^^^^^^^^^^^^^^^^
at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:184)
at org.apache.spark.sql.catalyst.expressions.IntervalDivide.divideByZeroCheck(intervalExpressions.scala:617)
at org.apache.spark.sql.catalyst.expressions.IntervalDivide.divideByZeroCheck$(intervalExpressions.scala:614)
at org.apache.spark.sql.catalyst.expressions.DivideDTInterval.divideByZeroCheck(intervalExpressions.scala:713)
at org.apache.spark.sql.catalyst.expressions.DivideDTInterval.nullSafeEval(intervalExpressions.scala:737)
at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:642)
at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:158)
... |
Signed-off-by: remzi <[email protected]>
build |
test_day_time_interval_division_overflow
build |
Signed-off-by: remzi <[email protected]>
build |
Signed-off-by: remzi <[email protected]>
build |
build |
Looks like there was a network glitch in the previous build |
LGTM, but it's better to test all the interval cases related to this change on all the Spark versions before merging this PR. |
LGTM |
Signed-off-by: remzi [email protected]
This is a subtask of #5196.
Rationale of this PR
We should throw the same type exceptions as Spark when doing interval division.
Changes in this PR
divByZero
before we do interval division, which includes:overflowInIntegralDivideError
in Spark instead of implementing our own one.test_day_time_interval_division_overflow
based on the kind of errors they throw.RapidsErrorUtils
as much as possible.