Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Each job run logging Java.IOException #577

Closed
guruvonline opened this issue Jul 1, 2020 · 6 comments
Closed

Each job run logging Java.IOException #577

guruvonline opened this issue Jul 1, 2020 · 6 comments

Comments

@guruvonline
Copy link

I am using release 0.10 and running a job with notebook starting .net application. Each Job is running on new cluster.
In cluster logs for each run of job, log4J logs show below exception

Java.io.IOException: Connection reset by peer.

Event though looks like job completed processing, but not sure why each job run is showing exception in cluster logs.

Thanks

Logs.txt

@imback82
Copy link
Contributor

imback82 commented Jul 1, 2020

This seems to be your job specific error messages (related to your work nodes):

20/06/25 04:20:08 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, 10.139.64.5, 46609, None)
20/06/25 04:20:08 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor
20/06/25 04:20:08 INFO DAGScheduler: Shuffle files lost for executor: 2 (epoch 31)
20/06/25 04:20:08 INFO DAGScheduler: Shuffle files lost for worker worker-20200625041745-10.139.64.5-41163 on host 10.139.64.5
20/06/25 04:20:08 WARN TransportChannelHandler: Exception in connection from /10.139.64.8:52858
java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
	at sun.nio.ch.IOUtil.read(IOUtil.java:192)
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
	at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:247)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1147)
	at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:347)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635)

Do you have a repro we can try out?

@guruvonline
Copy link
Author

I am not sure which part of my application is throwing this error, so not sure how to create a trim version to repro. do you have any suggestion?

@MironAtHome
Copy link

I have the same trouble. I have followed this tutorial to a letter, https://dotnet.microsoft.com/learn/data/spark-tutorial/run. Spark version:
/ / ___ / /
\ / _ / _ `/ __/ '/
/
/ .__/_,// //_\ version 2.4.4
/
/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0-25)

Here is how error stack trace reads, hope it helps:

java.io.IOException: Failed to delete: C:\Users\myuser\AppData\Local\Temp\spark-f9ad1f11-c307-4ef2-8f86-ee33cf8ae23a\userFiles-b848a705-0a78-491e-a1b6-9a643f0d7d5b\microsoft-spark-2.4.x-0.10.0.jar at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1062) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:103) at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1974) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340) at org.apache.spark.SparkContext.stop(SparkContext.scala:1973) at org.apache.spark.sql.SparkSession.stop(SparkSession.scala:712) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.api.dotnet.DotnetBackendHandler.handleMethodCall(DotnetBackendHandler.scala:162) at org.apache.spark.api.dotnet.DotnetBackendHandler.handleBackendRequest(DotnetBackendHandler.scala:102) at org.apache.spark.api.dotnet.DotnetBackendHandler.channelRead0(DotnetBackendHandler.scala:29) at org.apache.spark.api.dotnet.DotnetBackendHandler.channelRead0(DotnetBackendHandler.scala:24) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748)

@imback82
Copy link
Contributor

imback82 commented Jul 2, 2020

@MironAtHome your issue is different and a known issue. Please check this thread: #49 (comment)

@Niharikadutta
Copy link
Collaborator

@guruvonline Are you still facing this issue?

@Niharikadutta
Copy link
Collaborator

Hi, we are going to close this issue as it has been inactive for a while. Please feel free to re-open it if the issue persists and/or there are any new updates. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants