-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16303. Improve handling of datanode lost while decommissioning #3746
Conversation
@@ -25,9 +25,11 @@ | |||
import java.util.Set; | |||
import java.util.concurrent.ExecutionException; | |||
|
|||
import org.apache.hadoop.fs.Path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove the unused imports added to this class in the next revision
💔 -1 overall
This message was automatically generated. |
@sodonnel The existing test "TestDecommissioningStatus.testDecommissionStatusAfterDNRestart" will be problematic for this change As previously stated, removing the dead DECOMMISSION_INPROGRESS node from the DatanodeAdminManager means that when there are no LowRedundancy blocks the dead node will remain in DECOMMISSION_INPROGRESS rather than transitioning to DECOMMISSIONED This violates the expectation the the unit test is enforcing which is that a dead DECOMMISSION_INPROGRESS node should transition to DECOMMISSIONED when there are no LowRedundancy blocks
Line 451 in 6342d5e
Therefore, I think this is a good argument to remain more in favor of the original proposed change: #3675 |
Closing this PR in favor of this alternate solution: #3675 |
Description of PR
Fixes a bug in Hadoop HDFS where if more than "dfs.namenode.decommission.max.concurrent.tracked.nodes" datanodes are lost while in state decommissioning, then all forward progress towards decommissioning any datanodes (including healthy datanodes) is blocked
JIRA: https://issues.apache.org/jira/browse/HDFS-16303
Additional Details
To solve this HDFS bug, there are 2 different proposals:
These 2 different implementations will largely behave the same from a user perspective. There is however 1 key difference:
How was this patch tested?
3 new unit tests added to both "TestDecommission" & "TestDecommissionWithBackoffMonitor":
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?