-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support round and bround SQL functions #1244
Conversation
Signed-off-by: Niranjan Artal <[email protected]>
Signed-off-by: Niranjan Artal <[email protected]>
There is a bug in this which I am still figuring out. All test cases do not pass.
|
if (lhs.isInstanceOf[AutoCloseable]) { | ||
lhs.asInstanceOf[AutoCloseable].close() | ||
} | ||
if (rhs.isInstanceOf[AutoCloseable]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rhs
could be null at this point so needs a null check here. Perhaps it would be better to introduce something similar to the withResource
method that can work with Any
's that are possibly also AutoCloseable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added on
/** Executes the provided code block and then closes the value if it is AutoCloseable */ | |
def withResourceIfAllowed[T, V](r: T)(block: T => V): V = { | |
try { | |
block(r) | |
} finally { | |
r match { | |
case c: AutoCloseable => c.close() | |
case _ => //NOOP | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out we already have a withResource
that can work here.
withResourceIfAllowed(left.columnarEval(batch)) { lhs =>
withResourceIfAllowed(right.columnarEval(batch)) { rhs =>
(lhs, rhs) match {
case (l: GpuColumnVector, r) =>
withResource(GpuScalar.from(r, right.dataType)) { scalar =>
GpuColumnVector.from(doColumnar(l, scalar), dataType)
}
case _ => null
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But further to the point why is this not a GpuBinaryExpression
? BRound
and Round
are both BinaryExpression
s which would clean up this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove and @revans2 for taking a look. Sorry missed this conversation. The reason I was not making it as GpuBinaryExpression
is because there isn't an BinaryOp for round or BRound in JNI or in cudf binaryops. And I was not sure what would be the overidden enum be for these operators. But this also isn't working for all cases. So I will try how to go about it as a GpuBinaryExpression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GpuBinaryExpression
does not assume an enum but CudfBinaryExpression
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! changed it to use GpuBinaryExpression
. Will be in next patch.
@@ -183,6 +183,22 @@ def test_shift_right_unsigned(data_gen): | |||
'shiftrightunsigned(a, cast(null as INT))', | |||
'shiftrightunsigned(a, b)')) | |||
|
|||
@pytest.mark.parametrize('data_gen', [decimal_gen_scale_precision], ids=idfn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
round
and bround
support all numeric types, not just decimal (float, int, byte, double, etc). Do we have tests for these too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will add tests for other numeric types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests for other numeric types.
Signed-off-by: Niranjan Artal <[email protected]>
Signed-off-by: Niranjan Artal <[email protected]>
Cleaned up the code to use |
I couldn't figure out the issue here as the code seems straight forward. So, switched to see if i can reproduce it in Java code. This is reproducible in Java side as well. So have to debug it in cudf Java |
… no-op(#6975) @nartal1 found a small bug while working on: NVIDIA/spark-rapids#1244 Problem is that for `fixed_point`, when the column `scale = -decimal_places`, it should be a no-op. Fix is to make it a no-op. Authors: - Conor Hoekstra <[email protected]> Approvers: - David - Karthikeyan URL: #6975
Signed-off-by: Niranjan Artal <[email protected]>
val scaleVal=val1.getInt | ||
val scale = dataType match { | ||
case DecimalType.Fixed(p, s) => s | ||
case ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType => val1.getInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@revans2 Could you please suggest how we handle overflow when for each types.
For example(considering short type), pyspark results in 0
for the below round operation:
>>> df2= spark.createDataFrame(data=[32562], schema=ShortType())
>>> ret = df2.selectExpr('round(value, -5)')
>>> ret.show()
+----------------+
|round(value, -5)|
+----------------+
| 0|
+----------------+
But we see different GPU result(-31072) as overflow results in undefined behavior in libcudf.
Should we throw an exception whenever we sense an overflow for each type at this point ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is completely in how we implement round/bround vs how spark does it, and I am not 100% sure how to make them sync up without a lot of work on the cudf side for these corner cases.
cudf tries to do the round on the native type, which can result in an overflow. Spark will convert the native value to a decimal value (128-bits if needed), set the scale to do the rounding, and then convert the value back (with some special cases for NaN and Infinite in floating point).
There can be no overflow in those cases because all of the processing is happening on 128-bits. For integer smaller than a long we could cast it to a long first, do the round/bround, and then cast it back. But we would still end up with issues in long because of overflow.
Similar with float/double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests for overflow to check if it is working correctly or are we going to mark the operators and incompatible until we can figure out a way to make it work properly?
Signed-off-by: Niranjan Artal <[email protected]>
@revans2 I am waiting on another fix in cudf to get merged before starting the CI here. It would be great if you could please take another look and suggest if looks okay for this iteration. If further changes are required, I can make the changes and verify it locally. |
Found a small bug while working on NVIDIA/spark-rapids#1244. For negative integers, it was not rounding to nearest even number. Authors: - Niranjan Artal <[email protected]> - Conor Hoekstra <[email protected]> Approvers: - Conor Hoekstra - Mark Harris URL: #7014
build |
Signed-off-by: Niranjan Artal <[email protected]>
build |
Signed-off-by: Niranjan Artal <[email protected]>
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/mathExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/mathExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/mathExpressions.scala
Outdated
Show resolved
Hide resolved
val scaleVal=val1.getInt | ||
val scale = dataType match { | ||
case DecimalType.Fixed(p, s) => s | ||
case ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType => val1.getInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests for overflow to check if it is working correctly or are we going to mark the operators and incompatible until we can figure out a way to make it work properly?
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuExpressions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Niranjan Artal <[email protected]>
build |
@andygrove you reviewed this before are you okay with merging it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Niranjan Artal <[email protected]>
Signed-off-by: Niranjan Artal <[email protected]>
* Run pre-commit to format files. We were behind a bit. * Update pre-commit config to 16.0.1 to match cudf. Re-ran formatting. * Reformat of code via pre-commit Signed-off-by: db <[email protected]> --------- Signed-off-by: db <[email protected]>
This fixes #37