-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16348. Mark slownode as badnode to recover pipeline #3704
Conversation
@ferhui @tasanuma @Hexiaoqiao Could you help to review this PR? |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@symious Sorry for reviewing it very late. I looked through it, and it mostly looks good to me. I have one question. This feature is similar to ECN. The configuration of ECN is |
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
Outdated
Show resolved
Hide resolved
@tasanuma Thanks for the review.
I tried to digest the design of the code again. IMHO, the config of "dfs.pipeline.ecn" is like "dfs.pipeline.ecn.enable", the value of "true" means datanode should add the ECN information in PipelineAck, so this configuration is for DataNode to add or not add ECN. On Client side, there is no config for clients to choose if she should use ECN information, since it will use the information by default. In this ticket, I think a similar config of "dfs.pipeline.ecn" should be "dfs.pipeline.reply_slownode_in_pipeline_ack.enable", based on which DataNodes will choose to add or skip the slownode information in PipelineAck. I think it's more depend on the subject of the operation, that is the subject of "dfs.pipeline.ecn" is DataNode, while the subject of "dfs.client.mark.slownode.as.badnode" is Client. |
@symious Thanks for your explanation. |
@tasanuma Sorry for the late reply.
Since it's similar to ECN, I think it's reasonable to set on the server side. Another problem I was thinking is the configuration of "dfs.client.mark.slownode.as.badnode.threshold", this configuration seems more related to the choice of Clients, it might be a little rigid to set a default value on server side. Should I leave this configuration on client side? |
@symious Thanks for your reply.
I agreed. Thanks.
I also agreed that |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@tasanuma Updated the patch, could you help to check? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@symious Thanks for updating PR. It mostly looks good. I left some minor comments.
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
Outdated
Show resolved
Hide resolved
...doop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java
Show resolved
Hide resolved
...doop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java
Show resolved
Hide resolved
...-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
Outdated
Show resolved
Hide resolved
@tasanuma Thanks for the detailed review. Updated as suggested, please have a check. |
Thanks for updating PR, @symious. +1 from me, pending Jenkins. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I will merge it in a week if there is no objection.
Merged it. Thanks for your great work, @symious! |
Description of PR
In HDFS-16320, the DataNode can retrieve the SLOW status from each NameNode.
This ticket is to send this information back to Clients who are writing blocks. If a Clients noticed the pipeline is build on a slownode, he/she can choose to mark the slownode as a badnode to exclude the node or rebuild a pipeline.
In order to avoid the false positives, we added a config of "threshold", only clients continuously receives slownode reply from the same node will the node be marked as SLOW.
Jira ticket: https://issues.apache.org/jira/browse/HDFS-16348
How was this patch tested?
unit test
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?