-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAPREDUCE-7329: HadoopPipes task has failed because of the ping timeout exception #2775
Conversation
💔 -1 overall
This message was automatically generated. |
@liuml07 @steveloughran Could you please help review the patch? It aims to fix Hadoop Pipes bug. Thanks. |
I did not touch this pipe class or package. I can take a look later this week, but @aajisaka and @wangdatan will have more context. Thanks, |
ok, thanks any way ! |
@aajisaka Could you please help review the patch? thanks. |
Never seen this code before so I'm not really in a position to review. Just trying to revise my sockets API knowledge, which dates from when I was writing Windows 3.1 code and hasn't been refreshed much, not since HTTP came along |
@liuml07 @steveloughran hi, how is it going now? can you review this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I said "I have never looked at this code", so am not in a position to it in. It wouldn't be safe
...t/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/pipes/Application.java
Outdated
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/pipes/Application.java
Outdated
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/pipes/Application.java
Outdated
Show resolved
Hide resolved
5dcb605
to
c4afa68
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
c4afa68
to
9fce5d1
Compare
💔 -1 overall
This message was automatically generated. |
Can we create a regression test for this issue? |
...t/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/pipes/Application.java
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/pipes/Application.java
Outdated
Show resolved
Hide resolved
Can we set a shorter timeout than the default (2h) in the server socket to clear the queue? |
if do this, we need to modify kernel parameter |
Can |
Of course, but need to accept client socket first? In my cleaner implement, it will close socket correctly, and thus accept queue will also be cleared. |
No, this method must be called before accepting client sockets.
The default value is 60 seconds in Hadoop IPC client (ipc.ping.interval). Is it too long for Hadoop Pipes application? |
LGTM I thought only setting so_timeout fixes this problem. However, it is not true. The problem is that the Application does not call |
Yes, this is exactly what i am trying to fix. I will try to update with a regression test soon. Thanks! |
459bdcd
to
2012f30
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
92a8f1e
to
f43e450
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
cebcd85
to
9d4443f
Compare
💔 -1 overall
This message was automatically generated. |
@aajisaka Hi, would you please help to review some changes about yesterday's advice again ? And i don't know why there is 💔 -1 overall cause i have not changed that part code. |
💔 -1 overall
This message was automatically generated. |
The test failures look unrelated to the patch.
|
...educe-client-jobclient/src/test/java/org/apache/hadoop/mapred/pipes/TestPipeApplication.java
Outdated
Show resolved
Hide resolved
...educe-client-jobclient/src/test/java/org/apache/hadoop/mapred/pipes/TestPipeApplication.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regression test looks good to me. I'm +1 if my comments are addressed.
…hange from 3.x to 4.x
9d4443f
to
08790b1
Compare
💔 -1 overall
This message was automatically generated. |
It seems like all comments are addressed, and the |
Merged. Thank you @lichaojacobs and @steveloughran |
…hange from 3.x to 4.x (apache#2775)
jira: https://issues.apache.org/jira/browse/MAPREDUCE-7329
Hadoop Pipes Ping implement has a bug. Recently, we upgrade linux kernel version from 3.x to 4.x. And we find hadoop pipe task exit with connect timeout which is implemented by PingThread in HadoopPipes.cc.
After a deep research, we finally find that current ping server won't accept ping client created socket, which may cause critical problem:
To fix this bug, we introduced a PingSocketCleaner thread, which will continuously accept ping socket connection from ping client. When socket close from client, cleaner thread will detecte closed inputStream through
read
method returns-1
, then finally close socket from sever side.Refrenced by linux kernel patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5ea8ea2cb7