GPU sample exec #3789

res-life · 2021-10-11T13:17:49Z

GPU sample exec
This fixes #3419

Signed-off-by: Chong Gao [email protected]

Signed-off-by: Chong Gao <[email protected]>

res-life · 2021-10-11T13:23:12Z

It's a draft PR.

[fixed] Need to check lower bound and upper bound like BernoulliCellSampler
[fixed] Some times throws following error:
E : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 4.0 failed 1 times, most recent failure: Lost task 3.0 in stage 4.0 (TID 63) (10.7.24.14 executor driver): java.lang.AssertionError: Input table cannot be empty
E at ai.rapids.cudf.Table.assertForBounds(Table.java:1584)
E at ai.rapids.cudf.Table.upperBound(Table.java:1553)
E at ai.rapids.cudf.Table.upperBound(Table.java:1579)
E at com.nvidia.spark.rapids.GpuSorter.$anonfun$upperBound$2(SortUtils.scala:168)
E at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
E at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
E at com.nvidia.spark.rapids.GpuSorter.withResource(SortUtils.scala:65)
E at com.nvidia.spark.rapids.GpuSorter.$anonfun$upperBound$1(SortUtils.scala:167)
E at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
E at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
E at com.nvidia.spark.rapids.GpuSorter.withResource(SortUtils.scala:65)
E at com.nvidia.spark.rapids.GpuSorter.upperBound(SortUtils.scala:166)
E at com.nvidia.spark.rapids.GpuRangePartitioner.$anonfun$computeBoundsAndClose$4(GpuRangePartitioner.scala:189)
E at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
E at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
E at com.nvidia.spark.rapids.GpuRangePartitioner.withResource(GpuRangePartitioner.scala:169)
E at com.nvidia.spark.rapids.GpuRangePartitioner.$anonfun$computeBoundsAndClose$3(GpuRangePartitioner.scala:188)
E at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
E at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
E at com.nvidia.spark.rapids.GpuRangePartitioner.withResource(GpuRangePartitioner.scala:169)
E at com.nvidia.spark.rapids.GpuRangePartitioner.$anonfun$computeBoundsAndClose$2(GpuRangePartitioner.scala:186)
E at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
E at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
E at com.nvidia.spark.rapids.GpuRangePartitioner.withResource(GpuRangePartitioner.scala:169)
E at com.nvidia.spark.rapids.GpuRangePartitioner.$anonfun$computeBoundsAndClose$1(GpuRangePartitioner.scala:184)
E at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
E at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
E at com.nvidia.spark.rapids.GpuRangePartitioner.withResource(GpuRangePartitioner.scala:169)
E at com.nvidia.spark.rapids.GpuRangePartitioner.computeBoundsAndClose(GpuRangePartitioner.scala:182)
E at com.nvidia.spark.rapids.GpuRangePartitioner.columnarEval(GpuRangePartitioner.scala:203)
E at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$.$anonfun$prepareBatchShuffleDependency$3(GpuShuffleExchangeExecBase.scala:275)
E at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:296)
E at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:307)
E at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
E at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
E at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
E at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
E at org.apache.spark.scheduler.Task.run(Task.scala:131)
E at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
E at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
E at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
E at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
E at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
E at java.lang.Thread.run(Thread.java:748)

@revans2 Help to check if it's possible that GpuRangePartitioner.computeBoundsAndClose will throw error when columnar batch is empty?

res-life · 2021-10-11T13:52:45Z

build

integration_tests/src/main/python/sample_test.py

sql-plugin/src/main/scala/com/nvidia/spark/rapids/basicPhysicalOperators.scala

Signed-off-by: Chong Gao <[email protected]>

res-life · 2021-10-12T01:28:21Z

build

wbo4958 · 2021-10-12T01:41:24Z

integration_tests/src/main/python/sample_test.py

@@ -0,0 +1,38 @@
+# Copyright (c) 2020-2021, NVIDIA CORPORATION.


The file is new here? then should be 2021

wbo4958 · 2021-10-12T01:44:15Z

integration_tests/src/main/python/sample_test.py

+from pyspark.sql.types import *
+from marks import *
+
+_table_gen = [


Looks like this PR has enabled many data types for the GpuSampleExec, but the test seems not to be covering them?

integration_tests/src/main/python/sample_test.py

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuPartitionwiseSampledRDD.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuPoissonSampler.scala

wbo4958 · 2021-10-12T01:56:49Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/basicPhysicalOperators.scala

+                  builder =>
+                    (0 until numRows).foreach(_ => {
+                      val x = rng.nextDouble()
+                      val n = if ((x >= lowerBound) && (x < upperBound)) 1 else 0


You may need "BernoulliCellSampler"

sql-plugin/src/main/scala/com/nvidia/spark/rapids/basicPhysicalOperators.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuPartitionwiseSampledRDD.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuPoissonSampler.scala

Signed-off-by: Chong Gao <[email protected]>

res-life · 2021-10-12T04:05:20Z

build

Signed-off-by: Chong Gao <[email protected]>

res-life · 2021-10-12T05:13:54Z

build

Signed-off-by: Chong Gao <[email protected]>

res-life · 2021-10-12T07:44:56Z

build

…test cases; refactor code Signed-off-by: Chong Gao <[email protected]>

res-life · 2021-10-12T12:30:26Z

Addressed the comments.
Fixed bug that GpuRangePartitioner throws an exception when table is empty.
Still draft, nested types failed, need investigation.

res-life · 2021-10-12T13:06:33Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRangePartitioner.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuPoissonSampler.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/basicPhysicalOperators.scala

Signed-off-by: Chong Gao <[email protected]>

res-life · 2021-10-13T04:14:30Z

build

Signed-off-by: Chong Gao <[email protected]>

res-life · 2021-10-14T01:20:36Z

build

res-life · 2021-10-14T10:19:29Z

@revans2 help to review

revans2

Mostly just a few more nits. Looking good

integration_tests/src/main/python/sample_test.py

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRangePartitioner.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/basicPhysicalOperators.scala

Signed-off-by: Chong Gao <[email protected]>

res-life · 2021-10-15T03:24:20Z

build

docs/supported_ops.md

firestarman

Seems there are many unrelated diffs in file supported_ops.md. Could you confirm it?

res-life · 2021-10-15T10:45:12Z

@firestarman About the supported_ops.md diff problem you mentioned.
Confirmed, no problem. Only added the SampleExec section.
SortExec and TakeOrderedAndProjectExec sections actually are not impacted. I confirmed with GUI compare software.

res-life · 2021-10-18T02:11:51Z

Created one issue to spark about the sort order:
https://issues.apache.org/jira/browse/SPARK-37040

GPU sample exec

1916808

Signed-off-by: Chong Gao <[email protected]>

res-life marked this pull request as draft October 11, 2021 13:18

sameerz added the feature request New feature or request label Oct 11, 2021

revans2 reviewed Oct 11, 2021

View reviewed changes

optimize imports order

468b301

Signed-off-by: Chong Gao <[email protected]>

wbo4958 reviewed Oct 12, 2021

View reviewed changes

firestarman reviewed Oct 12, 2021

View reviewed changes

wbo4958 reviewed Oct 12, 2021

View reviewed changes

firestarman reviewed Oct 12, 2021

View reviewed changes

refactor, add license info, test case update

7bee75a

Signed-off-by: Chong Gao <[email protected]>

add modified docs

2f01270

Signed-off-by: Chong Gao <[email protected]>

update docs

7630e77

Signed-off-by: Chong Gao <[email protected]>

fix GpuRangePartitioner throws exception when table is empty; update …

8e84ebb

…test cases; refactor code Signed-off-by: Chong Gao <[email protected]>

revans2 reviewed Oct 12, 2021

View reviewed changes

Optimize imports; Add test cases

c56d239

Signed-off-by: Chong Gao <[email protected]>

Add Coalesce for sample exec; other refactor

2efb2be

Signed-off-by: Chong Gao <[email protected]>

res-life marked this pull request as ready for review October 13, 2021 09:44

revans2 reviewed Oct 14, 2021

View reviewed changes

Chong Gao added 2 commits October 15, 2021 10:38

Merge branch 'branch-21.12' into sample-exec

ed04401

Refactor, comments

c110bac

Signed-off-by: Chong Gao <[email protected]>

firestarman reviewed Oct 15, 2021

View reviewed changes

docs/supported_ops.md Show resolved Hide resolved

docs/supported_ops.md Show resolved Hide resolved

docs/supported_ops.md Show resolved Hide resolved

firestarman reviewed Oct 15, 2021

View reviewed changes

revans2 approved these changes Oct 18, 2021

View reviewed changes

res-life merged commit e94b961 into NVIDIA:branch-21.12 Oct 19, 2021

This was referenced Oct 19, 2021

[BUG] not found: type PoissonDistribution in databricks build #3854

Closed

[BUG] test_sample_produce_empty_batch failed in dataproc #3864

Closed

res-life deleted the sample-exec branch March 13, 2022 05:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU sample exec #3789

GPU sample exec #3789

res-life commented Oct 11, 2021

res-life commented Oct 11, 2021 •

edited

Loading

res-life commented Oct 11, 2021

res-life commented Oct 12, 2021

wbo4958 Oct 12, 2021

wbo4958 Oct 12, 2021

wbo4958 Oct 12, 2021

res-life commented Oct 12, 2021

res-life commented Oct 12, 2021

res-life commented Oct 12, 2021

res-life commented Oct 12, 2021 •

edited

Loading

res-life commented Oct 12, 2021

res-life commented Oct 13, 2021

res-life commented Oct 14, 2021

res-life commented Oct 14, 2021

revans2 left a comment

res-life commented Oct 15, 2021

firestarman left a comment

res-life commented Oct 15, 2021

res-life commented Oct 18, 2021 •

edited

Loading

		@@ -0,0 +1,38 @@
		# Copyright (c) 2020-2021, NVIDIA CORPORATION.

GPU sample exec #3789

GPU sample exec #3789

Conversation

res-life commented Oct 11, 2021

res-life commented Oct 11, 2021 • edited Loading

res-life commented Oct 11, 2021

res-life commented Oct 12, 2021

wbo4958 Oct 12, 2021

Choose a reason for hiding this comment

wbo4958 Oct 12, 2021

Choose a reason for hiding this comment

wbo4958 Oct 12, 2021

Choose a reason for hiding this comment

res-life commented Oct 12, 2021

res-life commented Oct 12, 2021

res-life commented Oct 12, 2021

res-life commented Oct 12, 2021 • edited Loading

res-life commented Oct 12, 2021

res-life commented Oct 13, 2021

res-life commented Oct 14, 2021

res-life commented Oct 14, 2021

revans2 left a comment

Choose a reason for hiding this comment

res-life commented Oct 15, 2021

firestarman left a comment

Choose a reason for hiding this comment

res-life commented Oct 15, 2021

res-life commented Oct 18, 2021 • edited Loading

res-life commented Oct 11, 2021 •

edited

Loading

res-life commented Oct 12, 2021 •

edited

Loading

res-life commented Oct 18, 2021 •

edited

Loading