Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Heartbeat from unknown executor when running with UCX shuffle in local mode #2725

Closed
jlowe opened this issue Jun 16, 2021 · 0 comments · Fixed by #2726
Closed

[BUG] Heartbeat from unknown executor when running with UCX shuffle in local mode #2725

jlowe opened this issue Jun 16, 2021 · 0 comments · Fixed by #2726
Assignees
Labels
bug Something isn't working shuffle things that impact the shuffle plugin

Comments

@jlowe
Copy link
Contributor

jlowe commented Jun 16, 2021

Describe the bug
Running with the RapidsShuffleManager while in local mode causes the following exceptions to be printed to the console every few seconds:

21/06/16 18:12:28 ERROR Inbox: Ignoring error
java.lang.IllegalStateException: Heartbeat from unknown executor BlockManagerId(driver, 10.28.9.126, 32931, None)
	at com.nvidia.spark.rapids.RapidsShuffleHeartbeatManager.executorHeartbeat(RapidsShuffleHeartbeatManager.scala:84)
	at com.nvidia.spark.rapids.RapidsDriverPlugin.receive(Plugin.scala:149)
	at org.apache.spark.internal.plugin.PluginEndpoint$$anonfun$receiveAndReply$1.applyOrElse(PluginEndpoint.scala:57)
	at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
	at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
	at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/06/16 18:12:28 ERROR RapidsShuffleHeartbeatEndpoint: Error during heartbeat
java.lang.IllegalStateException: Heartbeat from unknown executor BlockManagerId(driver, 10.28.9.126, 32931, None)
	at com.nvidia.spark.rapids.RapidsShuffleHeartbeatManager.executorHeartbeat(RapidsShuffleHeartbeatManager.scala:84)
	at com.nvidia.spark.rapids.RapidsDriverPlugin.receive(Plugin.scala:149)
	at org.apache.spark.internal.plugin.PluginEndpoint$$anonfun$receiveAndReply$1.applyOrElse(PluginEndpoint.scala:57)
	at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
	at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
	at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Steps/Code to reproduce bug
Run spark-shell in local mode with UCX (e.g.: --master local[*] --conf spark.shuffle.manager=com.nvidia.spark.rapids.spark301.RapidsShuffleManager

Expected behavior
No exception errors on the console

Environment details (please complete the following information)
Spark 3.0.1
RAPIDS Accelerator 21.08.0-SNAPSHOT

@jlowe jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify shuffle things that impact the shuffle plugin labels Jun 16, 2021
@abellina abellina self-assigned this Jun 16, 2021
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working shuffle things that impact the shuffle plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants