Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU sample exec #3789

Merged
merged 10 commits into from
Oct 19, 2021
Merged

GPU sample exec #3789

merged 10 commits into from
Oct 19, 2021

Conversation

res-life
Copy link
Collaborator

GPU sample exec
This fixes #3419

Signed-off-by: Chong Gao [email protected]

Signed-off-by: Chong Gao <[email protected]>
@res-life res-life marked this pull request as draft October 11, 2021 13:18
@res-life
Copy link
Collaborator Author

res-life commented Oct 11, 2021

It's a draft PR.

  1. [fixed] Need to check lower bound and upper bound like BernoulliCellSampler
  2. [fixed] Some times throws following error:
    E : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 4.0 failed 1 times, most recent failure: Lost task 3.0 in stage 4.0 (TID 63) (10.7.24.14 executor driver): java.lang.AssertionError: Input table cannot be empty
    E at ai.rapids.cudf.Table.assertForBounds(Table.java:1584)
    E at ai.rapids.cudf.Table.upperBound(Table.java:1553)
    E at ai.rapids.cudf.Table.upperBound(Table.java:1579)
    E at com.nvidia.spark.rapids.GpuSorter.$anonfun$upperBound$2(SortUtils.scala:168)
    E at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
    E at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
    E at com.nvidia.spark.rapids.GpuSorter.withResource(SortUtils.scala:65)
    E at com.nvidia.spark.rapids.GpuSorter.$anonfun$upperBound$1(SortUtils.scala:167)
    E at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
    E at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
    E at com.nvidia.spark.rapids.GpuSorter.withResource(SortUtils.scala:65)
    E at com.nvidia.spark.rapids.GpuSorter.upperBound(SortUtils.scala:166)
    E at com.nvidia.spark.rapids.GpuRangePartitioner.$anonfun$computeBoundsAndClose$4(GpuRangePartitioner.scala:189)
    E at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
    E at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
    E at com.nvidia.spark.rapids.GpuRangePartitioner.withResource(GpuRangePartitioner.scala:169)
    E at com.nvidia.spark.rapids.GpuRangePartitioner.$anonfun$computeBoundsAndClose$3(GpuRangePartitioner.scala:188)
    E at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
    E at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
    E at com.nvidia.spark.rapids.GpuRangePartitioner.withResource(GpuRangePartitioner.scala:169)
    E at com.nvidia.spark.rapids.GpuRangePartitioner.$anonfun$computeBoundsAndClose$2(GpuRangePartitioner.scala:186)
    E at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
    E at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
    E at com.nvidia.spark.rapids.GpuRangePartitioner.withResource(GpuRangePartitioner.scala:169)
    E at com.nvidia.spark.rapids.GpuRangePartitioner.$anonfun$computeBoundsAndClose$1(GpuRangePartitioner.scala:184)
    E at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
    E at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
    E at com.nvidia.spark.rapids.GpuRangePartitioner.withResource(GpuRangePartitioner.scala:169)
    E at com.nvidia.spark.rapids.GpuRangePartitioner.computeBoundsAndClose(GpuRangePartitioner.scala:182)
    E at com.nvidia.spark.rapids.GpuRangePartitioner.columnarEval(GpuRangePartitioner.scala:203)
    E at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$.$anonfun$prepareBatchShuffleDependency$3(GpuShuffleExchangeExecBase.scala:275)
    E at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:296)
    E at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:307)
    E at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
    E at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    E at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    E at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    E at org.apache.spark.scheduler.Task.run(Task.scala:131)
    E at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
    E at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
    E at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
    E at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    E at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    E at java.lang.Thread.run(Thread.java:748)

@revans2 Help to check if it's possible that GpuRangePartitioner.computeBoundsAndClose will throw error when columnar batch is empty?

@res-life
Copy link
Collaborator Author

build

@sameerz sameerz added the feature request New feature or request label Oct 11, 2021
Signed-off-by: Chong Gao <[email protected]>
@res-life
Copy link
Collaborator Author

build

@@ -0,0 +1,38 @@
# Copyright (c) 2020-2021, NVIDIA CORPORATION.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is new here? then should be 2021

from pyspark.sql.types import *
from marks import *

_table_gen = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this PR has enabled many data types for the GpuSampleExec, but the test seems not to be covering them?

builder =>
(0 until numRows).foreach(_ => {
val x = rng.nextDouble()
val n = if ((x >= lowerBound) && (x < upperBound)) 1 else 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need "BernoulliCellSampler"

@res-life
Copy link
Collaborator Author

build

Signed-off-by: Chong Gao <[email protected]>
@res-life
Copy link
Collaborator Author

build

Signed-off-by: Chong Gao <[email protected]>
@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

res-life commented Oct 12, 2021

Addressed the comments.
Fixed bug that GpuRangePartitioner throws an exception when table is empty.
Still draft, nested types failed, need investigation.

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

build

@res-life res-life marked this pull request as ready for review October 13, 2021 09:44
@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

@revans2 help to review

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just a few more nits. Looking good

@res-life
Copy link
Collaborator Author

build

docs/supported_ops.md Show resolved Hide resolved
docs/supported_ops.md Show resolved Hide resolved
docs/supported_ops.md Show resolved Hide resolved
Copy link
Collaborator

@firestarman firestarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there are many unrelated diffs in file supported_ops.md. Could you confirm it?

@res-life
Copy link
Collaborator Author

@firestarman About the supported_ops.md diff problem you mentioned.
Confirmed, no problem. Only added the SampleExec section.
SortExec and TakeOrderedAndProjectExec sections actually are not impacted. I confirmed with GUI compare software.

@res-life
Copy link
Collaborator Author

res-life commented Oct 18, 2021

Created one issue to spark about the sort order:
https://issues.apache.org/jira/browse/SPARK-37040

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add support for org.apache.spark.sql.execution.SampleExec
5 participants