Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] NPE when enabling shuffle manager #532

Closed
rongou opened this issue Aug 8, 2020 · 1 comment · Fixed by #534
Closed

[BUG] NPE when enabling shuffle manager #532

rongou opened this issue Aug 8, 2020 · 1 comment · Fixed by #534
Labels
bug Something isn't working

Comments

@rongou
Copy link
Collaborator

rongou commented Aug 8, 2020

Describe the bug
Getting a java.lang.NullPointerException after enabling the shuffle manager.

Steps/Code to reproduce bug
After adding the configs from https://github.com/NVIDIA/spark-rapids/blob/branch-0.2/docs/get-started/getting-started.md#enabling-rapidsshufflemanager, run the following in spark-shell:

import com.nvidia.spark.rapids.tests.tpcxbb._
TpcxbbLikeSpark.setupAllParquet(spark, "/data/dir")
Q16Like(spark).collect

then getting a stack trace (see below).

Expected behavior
Should not crash.

Environment details (please complete the following information)

  • Environment location: Standalone
  • Spark configuration settings related to the issue
 --conf spark.shuffle.manager=com.nvidia.spark.rapids.spark300.RapidsShuffleManager\
 --conf spark.shuffle.service.enabled=false\
 --conf spark.rapids.shuffle.transport.enabled=true\
 --conf spark.executorEnv.UCX_TLS=cuda_copy,cuda_ipc,rc,tcp\
 --conf spark.executorEnv.UCX_ERROR_SIGNALS=\
 --conf spark.executorEnv.UCX_MAX_RNDV_RAILS=1\
 --conf spark.executorEnv.UCX_MEMTYPE_CACHE=n\
 --conf spark.rapids.shuffle.ucx.bounceBuffers.device.count=16\
 --conf spark.rapids.shuffle.ucx.bounceBuffers.host.count=16\
 --conf spark.executor.extraClassPath=/opt/ucx/ucx_put_zcopy/lib:${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR}\

Additional context
Full stack trace:

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange hashpartitioning(w_state#1370, i_item_id#725, 16), true, [id=#593]
+- *(2) HashAggregate(keys=[w_state#1370, i_item_id#725], functions=[partial_sum(CASE WHEN (unix_timestamp(d_date#861, yyyy-MM-dd, Some(Etc/UTC)) < 984700800) THEN (ws_sales_price#1102 - coalesce(wr_refunded_cash#1308, 0.0)) ELSE 0.0 END), partial_sum(CASE WHEN (unix_timestamp(d_date#861, yyyy-MM-dd, Some(Etc/UTC)) >= 984700800) THEN (ws_sales_price#1102 - coalesce(wr_refunded_cash#1308, 0.0)) ELSE 0.0 END)], output=[w_state#1370, i_item_id#725, sum#1561, sum#1562])
   +- *(2) GpuColumnarToRow false
      +- GpuProject [ws_sales_price#1102, wr_refunded_cash#1308, i_item_id#725, w_state#1370, d_date#861]
         +- GpuBroadcastHashJoin [ws_sold_date_sk#1081L], [d_date_sk#859L], Inner, BuildRight
            :- GpuProject [ws_sold_date_sk#1081L, ws_sales_price#1102, wr_refunded_cash#1308, i_item_id#725, w_state#1370]
            :  +- GpuBroadcastHashJoin [ws_warehouse_sk#1096L], [w_warehouse_sk#1360L], Inner, BuildRight
            :     :- GpuProject [ws_sold_date_sk#1081L, ws_warehouse_sk#1096L, ws_sales_price#1102, wr_refunded_cash#1308, i_item_id#725]
            :     :  +- GpuBroadcastHashJoin [ws_item_sk#1084L], [i_item_sk#724L], Inner, BuildRight
            :     :     :- GpuProject [ws_sold_date_sk#1081L, ws_item_sk#1084L, ws_warehouse_sk#1096L, ws_sales_price#1102, wr_refunded_cash#1308]
            :     :     :  +- GpuShuffledHashJoin [ws_order_number#1098L, ws_item_sk#1084L], [wr_order_number#1301L, wr_item_sk#1290L], LeftOuter, BuildRight
            :     :     :     :- GpuCoalesceBatches TargetSize(2147483648)
            :     :     :     :  +- GpuColumnarExchange gpuhashpartitioning(ws_order_number#1098L, ws_item_sk#1084L, 16), true, [id=#527]
            :     :     :     :     +- GpuProject [ws_sold_date_sk#1081L, ws_item_sk#1084L, ws_warehouse_sk#1096L, ws_order_number#1098L, ws_sales_price#1102]
            :     :     :     :        +- GpuCoalesceBatches TargetSize(2147483648)
            :     :     :     :           +- GpuFilter ((gpuisnotnull(ws_item_sk#1084L) AND gpuisnotnull(ws_warehouse_sk#1096L)) AND gpuisnotnull(ws_sold_date_sk#1081L))
            :     :     :     :              +- GpuFileGpuScan parquet [ws_sold_date_sk#1081L,ws_item_sk#1084L,ws_warehouse_sk#1096L,ws_order_number#1098L,ws_sales_price#1102] Batched: true, DataFilters: [isnotnull(ws_item_sk#1084L), isnotnull(ws_warehouse_sk#1096L), isnotnull(ws_sold_date_sk#1081L)], Format: Parquet, Location: InMemoryFileIndex[file:/raid/spark-team/tpcxbb-100GB/web_sales], PartitionFilters: [], PushedFilters: [IsNotNull(ws_item_sk), IsNotNull(ws_warehouse_sk), IsNotNull(ws_sold_date_sk)], ReadSchema: struct<ws_sold_date_sk:bigint,ws_item_sk:bigint,ws_warehouse_sk:bigint,ws_order_number:bigint,ws_...
            :     :     :     +- GpuCoalesceBatches RequireSingleBatch
            :     :     :        +- GpuColumnarExchange gpuhashpartitioning(wr_order_number#1301L, wr_item_sk#1290L, 16), true, [id=#531]
            :     :     :           +- GpuProject [wr_item_sk#1290L, wr_order_number#1301L, wr_refunded_cash#1308]
            :     :     :              +- GpuCoalesceBatches TargetSize(2147483648)
            :     :     :                 +- GpuFilter (gpuisnotnull(wr_order_number#1301L) AND gpuisnotnull(wr_item_sk#1290L))
            :     :     :                    +- GpuFileGpuScan parquet [wr_item_sk#1290L,wr_order_number#1301L,wr_refunded_cash#1308] Batched: true, DataFilters: [isnotnull(wr_order_number#1301L), isnotnull(wr_item_sk#1290L)], Format: Parquet, Location: InMemoryFileIndex[file:/raid/spark-team/tpcxbb-100GB/web_returns], PartitionFilters: [], PushedFilters: [IsNotNull(wr_order_number), IsNotNull(wr_item_sk)], ReadSchema: struct<wr_item_sk:bigint,wr_order_number:bigint,wr_refunded_cash:double>
            :     :     +- GpuBroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true])), [id=#538]
            :     :        +- GpuProject [i_item_sk#724L, i_item_id#725]
            :     :           +- GpuCoalesceBatches TargetSize(2147483648)
            :     :              +- GpuFilter gpuisnotnull(i_item_sk#724L)
            :     :                 +- GpuFileGpuScan parquet [i_item_sk#724L,i_item_id#725] Batched: true, DataFilters: [isnotnull(i_item_sk#724L)], Format: Parquet, Location: InMemoryFileIndex[file:/raid/spark-team/tpcxbb-100GB/item], PartitionFilters: [], PushedFilters: [IsNotNull(i_item_sk)], ReadSchema: struct<i_item_sk:bigint,i_item_id:string>
            :     +- GpuBroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true])), [id=#543]
            :        +- GpuProject [w_warehouse_sk#1360L, w_state#1370]
            :           +- GpuCoalesceBatches TargetSize(2147483648)
            :              +- GpuFilter gpuisnotnull(w_warehouse_sk#1360L)
            :                 +- GpuFileGpuScan parquet [w_warehouse_sk#1360L,w_state#1370] Batched: true, DataFilters: [isnotnull(w_warehouse_sk#1360L)], Format: Parquet, Location: InMemoryFileIndex[file:/raid/spark-team/tpcxbb-100GB/warehouse], PartitionFilters: [], PushedFilters: [IsNotNull(w_warehouse_sk)], ReadSchema: struct<w_warehouse_sk:bigint,w_state:string>
            +- GpuBroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true])), [id=#586]
               +- GpuProject [d_date_sk#859L, d_date#861]
                  +- GpuRowToColumnar TargetSize(2147483648)
                     +- *(1) Filter (((unix_timestamp(d_date#861, yyyy-MM-dd, Some(Etc/UTC)) >= 982108800) AND (unix_timestamp(d_date#861, yyyy-MM-dd, Some(Etc/UTC)) <= 987292800)) AND isnotnull(d_date_sk#859L))
                        +- *(1) GpuColumnarToRow false
                           +- GpuFileGpuScan parquet [d_date_sk#859L,d_date#861] Batched: true, DataFilters: [(unix_timestamp(d_date#861, yyyy-MM-dd, Some(Etc/UTC)) >= 982108800), (unix_timestamp(d_date#861..., Format: Parquet, Location: InMemoryFileIndex[file:/raid/spark-team/tpcxbb-100GB/date_dim], PartitionFilters: [], PushedFilters: [IsNotNull(d_date_sk)], ReadSchema: struct<d_date_sk:bigint,d_date:string>

  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:95)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
  at org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525)
  at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453)
  at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452)
  at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496)
  at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:162)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
  at org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:183)
  at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3625)
  at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2938)
  at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2938)
  ... 69 elided
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
GpuColumnarExchange gpuhashpartitioning(ws_order_number#1098L, ws_item_sk#1084L, 16), true, [id=#527]
+- GpuProject [ws_sold_date_sk#1081L, ws_item_sk#1084L, ws_warehouse_sk#1096L, ws_order_number#1098L, ws_sales_price#1102]
   +- GpuCoalesceBatches TargetSize(2147483648)
      +- GpuFilter ((gpuisnotnull(ws_item_sk#1084L) AND gpuisnotnull(ws_warehouse_sk#1096L)) AND gpuisnotnull(ws_sold_date_sk#1081L))
         +- GpuFileGpuScan parquet [ws_sold_date_sk#1081L,ws_item_sk#1084L,ws_warehouse_sk#1096L,ws_order_number#1098L,ws_sales_price#1102] Batched: true, DataFilters: [isnotnull(ws_item_sk#1084L), isnotnull(ws_warehouse_sk#1096L), isnotnull(ws_sold_date_sk#1081L)], Format: Parquet, Location: InMemoryFileIndex[file:/raid/spark-team/tpcxbb-100GB/web_sales], PartitionFilters: [], PushedFilters: [IsNotNull(ws_item_sk), IsNotNull(ws_warehouse_sk), IsNotNull(ws_sold_date_sk)], ReadSchema: struct<ws_sold_date_sk:bigint,ws_item_sk:bigint,ws_warehouse_sk:bigint,ws_order_number:bigint,ws_...

  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
  at com.nvidia.spark.rapids.GpuShuffleExchangeExec.doExecuteColumnar(GpuShuffleExchangeExec.scala:109)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
  at com.nvidia.spark.rapids.GpuCoalesceBatches.doExecuteColumnar(GpuCoalesceBatches.scala:447)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
  at com.nvidia.spark.rapids.shims.spark300.GpuShuffledHashJoinExec.doExecuteColumnar(GpuShuffledHashJoinExec.scala:117)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
  at com.nvidia.spark.rapids.GpuProjectExec.doExecuteColumnar(basicPhysicalOperators.scala:84)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
  at com.nvidia.spark.rapids.shims.spark300.GpuBroadcastHashJoinExec.doExecuteColumnar(GpuBroadcastHashJoinExec.scala:145)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
  at com.nvidia.spark.rapids.GpuProjectExec.doExecuteColumnar(basicPhysicalOperators.scala:84)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
  at com.nvidia.spark.rapids.shims.spark300.GpuBroadcastHashJoinExec.doExecuteColumnar(GpuBroadcastHashJoinExec.scala:145)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
  at com.nvidia.spark.rapids.GpuProjectExec.doExecuteColumnar(basicPhysicalOperators.scala:84)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
  at com.nvidia.spark.rapids.shims.spark300.GpuBroadcastHashJoinExec.doExecuteColumnar(GpuBroadcastHashJoinExec.scala:145)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
  at com.nvidia.spark.rapids.GpuProjectExec.doExecuteColumnar(basicPhysicalOperators.scala:84)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
  at org.apache.spark.sql.execution.InputAdapter.doExecuteColumnar(WholeStageCodegenExec.scala:519)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:202)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:198)
  at com.nvidia.spark.rapids.GpuColumnarToRowExec.inputRDDs(GpuColumnarToRowExec.scala:59)
  at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:162)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:64)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:64)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.shuffleDependency$lzycompute(ShuffleExchangeExec.scala:83)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.shuffleDependency(ShuffleExchangeExec.scala:81)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.$anonfun$doExecute$1(ShuffleExchangeExec.scala:98)
  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
  ... 97 more
Caused by: java.lang.NullPointerException
  at org.apache.spark.sql.rapids.GpuShuffleEnv$.isRapidsShuffleEnabled(GpuShuffleEnv.scala:121)
  at org.apache.spark.sql.rapids.RapidsShuffleInternalManagerBase.shouldFallThroughOnEverything$lzycompute(RapidsShuffleInternalManager.scala:203)
  at org.apache.spark.sql.rapids.RapidsShuffleInternalManagerBase.shouldFallThroughOnEverything(RapidsShuffleInternalManager.scala:201)
  at org.apache.spark.sql.rapids.RapidsShuffleInternalManagerBase.registerShuffle(RapidsShuffleInternalManager.scala:270)
  at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:96)
  at org.apache.spark.sql.rapids.GpuShuffleDependency.<init>(GpuShuffleDependency.scala:34)
  at com.nvidia.spark.rapids.GpuShuffleExchangeExec$.prepareBatchShuffleDependency(GpuShuffleExchangeExec.scala:212)
  at com.nvidia.spark.rapids.GpuShuffleExchangeExec.shuffleBatchDependency$lzycompute(GpuShuffleExchangeExec.scala:98)
  at com.nvidia.spark.rapids.GpuShuffleExchangeExec.shuffleBatchDependency(GpuShuffleExchangeExec.scala:91)
  at com.nvidia.spark.rapids.GpuShuffleExchangeExec.$anonfun$doExecuteColumnar$1(GpuShuffleExchangeExec.scala:112)
  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
  ... 177 more

@jlowe @abellina

@rongou rongou added bug Something isn't working ? - Needs Triage Need team to review and classify labels Aug 8, 2020
@abellina
Copy link
Collaborator

abellina commented Aug 8, 2020

Thanks @rongou please take a look at #534. The issue is that the driver is calling into a function that expects certain state to be set, which only happens when Rmm is initialized (and that only happens in the executors). This is a new issue introduced in branch-0.2.

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Aug 13, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Signed-off-by: spark-rapids automation <[email protected]>

Signed-off-by: spark-rapids automation <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants