Support round and bround SQL functions #1244

nartal1 · 2020-12-02T19:34:18Z

This fixes #37

Signed-off-by: Niranjan Artal <[email protected]>

…nd_tests

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 · 2020-12-03T04:57:45Z

There is a bug in this which I am still figuring out. All test cases do not pass.
If the scale is 0, the value is incremented by 1(which shouldn't happen).
If I have something like below:

df1 = spark.createDataFrame(data=[(Decimal('3977631')), (Decimal('8291028')), (Decimal('8291025'))], schema=DecimalType(7,0))
ret = df2.selectExpr('round(value, 0)')

CPU result: (Decimal('3977631')), (Decimal('8291028')), (Decimal('8291025'))
GPU result: (Decimal('3977632')), (Decimal('8291029')), (Decimal('8291026'))

andygrove · 2020-12-07T18:13:42Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuExpressions.scala

+      if (lhs.isInstanceOf[AutoCloseable]) {
+        lhs.asInstanceOf[AutoCloseable].close()
+      }
+      if (rhs.isInstanceOf[AutoCloseable]) {


rhs could be null at this point so needs a null check here. Perhaps it would be better to introduce something similar to the withResource method that can work with Any's that are possibly also AutoCloseable?

I added on

spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/Arm.scala

Lines 52 to 62 in cce1ec1

/** Executes the provided code block and then closes the value if it is AutoCloseable */

def withResourceIfAllowed[T, V](r: T)(block: T => V): V = {

try {

block(r)

} finally {

r match {

case c: AutoCloseable => c.close()

case _ => //NOOP

}

}

}

It turns out we already have a withResource that can work here.

withResourceIfAllowed(left.columnarEval(batch)) { lhs => withResourceIfAllowed(right.columnarEval(batch)) { rhs => (lhs, rhs) match { case (l: GpuColumnVector, r) => withResource(GpuScalar.from(r, right.dataType)) { scalar => GpuColumnVector.from(doColumnar(l, scalar), dataType) } case _ => null } } }

But further to the point why is this not a GpuBinaryExpression? BRound and Round are both BinaryExpressions which would clean up this code.

Thanks @andygrove and @revans2 for taking a look. Sorry missed this conversation. The reason I was not making it as GpuBinaryExpression is because there isn't an BinaryOp for round or BRound in JNI or in cudf binaryops. And I was not sure what would be the overidden enum be for these operators. But this also isn't working for all cases. So I will try how to go about it as a GpuBinaryExpression

GpuBinaryExpression does not assume an enum but CudfBinaryExpression does.

Thanks! changed it to use GpuBinaryExpression. Will be in next patch.

revans2 · 2020-12-08T15:49:18Z

integration_tests/src/main/python/arithmetic_ops_test.py

@@ -183,6 +183,22 @@ def test_shift_right_unsigned(data_gen):
                'shiftrightunsigned(a, cast(null as INT))',
                'shiftrightunsigned(a, b)'))

+@pytest.mark.parametrize('data_gen', [decimal_gen_scale_precision], ids=idfn)


round and bround support all numeric types, not just decimal (float, int, byte, double, etc). Do we have tests for these too?

Yes, will add tests for other numeric types.

Added tests for other numeric types.

Signed-off-by: Niranjan Artal <[email protected]>

…nd_tests

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 · 2020-12-08T21:26:23Z

Cleaned up the code to use GpuBinaryExpression.
Need to add numerical tests and also verify decimal type for negative scale

nartal1 · 2020-12-10T06:14:32Z

There is a bug in this which I am still figuring out. All test cases do not pass.
If the scale is 0, the value is incremented by 1(which shouldn't happen).
If I have something like below:
df1 = spark.createDataFrame(data=[(Decimal('3977631')), (Decimal('8291028')), (Decimal('8291025'))], schema=DecimalType(7,0))
ret = df2.selectExpr('round(value, 0)')

CPU result: (Decimal('3977631')), (Decimal('8291028')), (Decimal('8291025'))
GPU result: (Decimal('3977632')), (Decimal('8291029')), (Decimal('8291026'))

I couldn't figure out the issue here as the code seems straight forward. So, switched to see if i can reproduce it in Java code. This is reproducible in Java side as well. So have to debug it in cudf Java

…nd_tests

@nartal1

… no-op(#6975) @nartal1 found a small bug while working on: NVIDIA/spark-rapids#1244 Problem is that for `fixed_point`, when the column `scale = -decimal_places`, it should be a no-op. Fix is to make it a no-op. Authors: - Conor Hoekstra <[email protected]> Approvers: - David - Karthikeyan URL: #6975

…nd_tests

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 · 2020-12-14T04:57:15Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/mathExpressions.scala

+    val scaleVal=val1.getInt
+    val scale = dataType match {
+      case DecimalType.Fixed(p, s) => s
+      case ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType => val1.getInt


@revans2 Could you please suggest how we handle overflow when for each types.
For example(considering short type), pyspark results in 0 for the below round operation:

>>> df2= spark.createDataFrame(data=[32562], schema=ShortType()) >>> ret = df2.selectExpr('round(value, -5)') >>> ret.show() +----------------+ |round(value, -5)| +----------------+ | 0| +----------------+ But we see different GPU result(-31072) as overflow results in undefined behavior in libcudf. Should we throw an exception whenever we sense an overflow for each type at this point ?

The problem is completely in how we implement round/bround vs how spark does it, and I am not 100% sure how to make them sync up without a lot of work on the cudf side for these corner cases.

cudf tries to do the round on the native type, which can result in an overflow. Spark will convert the native value to a decimal value (128-bits if needed), set the scale to do the rounding, and then convert the value back (with some special cases for NaN and Infinite in floating point).

https://github.com/apache/spark/blob/0626901bcbeebceb6937001e1f32934c71876210/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1220-L1249

There can be no overflow in those cases because all of the processing is happening on 128-bits. For integer smaller than a long we could cast it to a long first, do the round/bround, and then cast it back. But we would still end up with issues in long because of overflow.

Similar with float/double.

Do we have tests for overflow to check if it is working correctly or are we going to mark the operators and incompatible until we can figure out a way to make it work properly?

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 · 2020-12-16T21:13:14Z

@revans2 I am waiting on another fix in cudf to get merged before starting the CI here. It would be great if you could please take another look and suggest if looks okay for this iteration. If further changes are required, I can make the changes and verify it locally.

Found a small bug while working on NVIDIA/spark-rapids#1244. For negative integers, it was not rounding to nearest even number. Authors: - Niranjan Artal <[email protected]> - Conor Hoekstra <[email protected]> Approvers: - Conor Hoekstra - Mark Harris URL: #7014

nartal1 · 2020-12-18T02:40:36Z

build

…nd_tests

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 · 2020-12-18T05:04:23Z

build

…nd_tests

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 · 2021-01-06T06:16:32Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/mathExpressions.scala

revans2 · 2021-01-06T14:20:19Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/mathExpressions.scala

+    val scaleVal=val1.getInt
+    val scale = dataType match {
+      case DecimalType.Fixed(p, s) => s
+      case ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType => val1.getInt


Do we have tests for overflow to check if it is working correctly or are we going to mark the operators and incompatible until we can figure out a way to make it work properly?

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuExpressions.scala

…nd_tests

Signed-off-by: Niranjan Artal <[email protected]>

revans2 · 2021-01-06T22:27:30Z

build

revans2 · 2021-01-07T17:35:10Z

@andygrove you reviewed this before are you okay with merging it?

andygrove

LGTM

Signed-off-by: Niranjan Artal <[email protected]>

* Run pre-commit to format files. We were behind a bit. * Update pre-commit config to 16.0.1 to match cudf. Re-ran formatting. * Reformat of code via pre-commit Signed-off-by: db <[email protected]> --------- Signed-off-by: db <[email protected]>

nartal1 added 3 commits December 2, 2020 11:23

Support round and bround

c9c6a72

Signed-off-by: Niranjan Artal <[email protected]>

Merge branch 'branch-0.4' of github.com:NVIDIA/spark-rapids into brou…

4af74eb

…nd_tests

remove unused import

fbb4a1e

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 added the feature request New feature or request label Dec 2, 2020

nartal1 self-assigned this Dec 2, 2020

nartal1 added the SQL part of the SQL/Dataframe plugin label Dec 2, 2020

andygrove reviewed Dec 7, 2020

View reviewed changes

revans2 reviewed Dec 8, 2020

View reviewed changes

nartal1 added 3 commits December 8, 2020 13:03

Rework to use GpuBinaryExpression

3c87a5f

Signed-off-by: Niranjan Artal <[email protected]>

Merge branch 'branch-0.4' of github.com:NVIDIA/spark-rapids into brou…

ac2f729

…nd_tests

update round operators to take in decimal config

c4185fc

Signed-off-by: Niranjan Artal <[email protected]>

revans2 mentioned this pull request Dec 8, 2020

[FEA] Support decimal type #42

Closed

27 tasks

Merge branch 'branch-0.4' of github.com:NVIDIA/spark-rapids into brou…

2bf3824

…nd_tests

codereport mentioned this pull request Dec 10, 2020

Make cudf::round for fixed_point when scale = -decimal_places a no-op rapidsai/cudf#6975

Merged

nartal1 added 2 commits December 11, 2020 08:54

Merge branch 'branch-0.4' of github.com:NVIDIA/spark-rapids into brou…

8dff4b6

…nd_tests

addressed review comments

9318898

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 commented Dec 14, 2020

View reviewed changes

nartal1 changed the title ~~[WIP] Support round and bround SQL functions~~ Support round and bround SQL functions Dec 14, 2020

nartal1 mentioned this pull request Dec 15, 2020

Fix round operator's HALF_EVEN computation for negative integers rapidsai/cudf#7014

Merged

updated tests

ac14945

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 added 2 commits December 17, 2020 19:12

Merge branch 'branch-0.4' of github.com:NVIDIA/spark-rapids into brou…

a9f7614

…nd_tests

fix build after upmerge

a729637

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 added this to the Jan 4 - Jan 15 milestone Dec 18, 2020

nartal1 added 2 commits January 5, 2021 08:59

Merge branch 'branch-0.4' of github.com:NVIDIA/spark-rapids into brou…

cd8b85f

…nd_tests

upmerge and update configs.md, supported_ops.md

a4cfb2d

Signed-off-by: Niranjan Artal <[email protected]>

revans2 reviewed Jan 6, 2021

View reviewed changes

nartal1 added 2 commits January 6, 2021 09:12

Merge branch 'branch-0.4' of github.com:NVIDIA/spark-rapids into brou…

40fa6be

…nd_tests

addressed review comments

80a5973

Signed-off-by: Niranjan Artal <[email protected]>

revans2 approved these changes Jan 6, 2021

View reviewed changes

sameerz mentioned this pull request Jan 6, 2021

[FEA] Support for a custom DataSource V2 which supplies Arrow data #1072

Closed

andygrove approved these changes Jan 8, 2021

View reviewed changes

revans2 merged commit 41b6a66 into NVIDIA:branch-0.4 Jan 8, 2021

nartal1 added a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Support round and bround SQL functions (NVIDIA#1244)

d347102

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 added a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Support round and bround SQL functions (NVIDIA#1244)

d763851

Signed-off-by: Niranjan Artal <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support round and bround SQL functions #1244

Support round and bround SQL functions #1244

nartal1 commented Dec 2, 2020 •

edited

Loading

nartal1 commented Dec 3, 2020

andygrove Dec 7, 2020

revans2 Dec 7, 2020

andygrove Dec 7, 2020

revans2 Dec 7, 2020

nartal1 Dec 8, 2020

revans2 Dec 8, 2020

nartal1 Dec 8, 2020

revans2 Dec 8, 2020

nartal1 Dec 8, 2020

nartal1 Jan 6, 2021

nartal1 commented Dec 8, 2020

nartal1 commented Dec 10, 2020

nartal1 Dec 14, 2020

revans2 Dec 14, 2020

revans2 Jan 6, 2021

nartal1 commented Dec 16, 2020

nartal1 commented Dec 18, 2020

nartal1 commented Dec 18, 2020

nartal1 commented Jan 6, 2021

revans2 Jan 6, 2021

revans2 commented Jan 6, 2021

revans2 commented Jan 7, 2021

andygrove left a comment

	/** Executes the provided code block and then closes the value if it is AutoCloseable */
	def withResourceIfAllowed[T, V](r: T)(block: T => V): V = {
	try {
	block(r)
	} finally {
	r match {
	case c: AutoCloseable => c.close()
	case _ => //NOOP
	}
	}
	}

Support round and bround SQL functions #1244

Support round and bround SQL functions #1244

Conversation

nartal1 commented Dec 2, 2020 • edited Loading

nartal1 commented Dec 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nartal1 commented Dec 8, 2020

nartal1 commented Dec 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nartal1 commented Dec 16, 2020

nartal1 commented Dec 18, 2020

nartal1 commented Dec 18, 2020

nartal1 commented Jan 6, 2021

Choose a reason for hiding this comment

revans2 commented Jan 6, 2021

revans2 commented Jan 7, 2021

andygrove left a comment

Choose a reason for hiding this comment

nartal1 commented Dec 2, 2020 •

edited

Loading