-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17093. Fix block report lease issue to avoid missing some storages report. #5855
Conversation
…read (apache#5706) Contributed by Moditha Hewasinghage <!-- Thanks for sending a pull request! 1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute 2. Make sure your PR title starts with JIRA issue id, e.g., 'HADOOP-17799. Your PR title ...'. --> ### Description of PR ### How was this patch tested? ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HDFS-17093. In the case of all datanodes sending FBR when the namenode restarts (large clusters), there is an issue with incomplete block reporting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave some comments inline. PFYI.
BlockReportContext context) throws IOException { | ||
BlockReportContext context, | ||
int totalReportNum, | ||
int currentReportNum) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a. Please add some Javadoc about added parameter.
b. Will this name be more readable?
totalReportNum -> totalStorageReportsNum,
currentReportNum -> storageReportIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok
@@ -1650,7 +1650,7 @@ public DatanodeCommand blockReport(final DatanodeRegistration nodeReg, | |||
final int index = r; | |||
noStaleStorages = bm.runBlockOp(() -> | |||
bm.processReport(nodeReg, reports[index].getStorage(), | |||
blocks, context)); | |||
blocks, context, reports.length, index+1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codestyle: index + 1
(leave one space here)
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storageInfo.blockReportCount
is updated in line 2932, which means that storageInfo.getBlockReportCount() > 0
is true only after the first report of this storage is processed successfully in line 2924.
Lines 2916 to 2932 in 7ba2bd6
if (!storageInfo.hasReceivedBlockReport()) { | |
// The first block report can be processed a lot more efficiently than | |
// ordinary block reports. This shortens restart times. | |
blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: Processing first " | |
+ "storage report for {} from datanode {}", | |
strBlockReportId, fullBrLeaseId, | |
storageInfo.getStorageID(), | |
nodeID); | |
processFirstBlockReport(storageInfo, newReport); | |
} else { | |
// Block reports for provided storage are not | |
// maintained by DN heartbeats | |
if (!StorageType.PROVIDED.equals(storageInfo.getStorageType())) { | |
invalidatedBlocks = processReport(storageInfo, newReport); | |
} | |
} | |
storageInfo.receivedBlockReport(); |
So I don't understand how the process reach line 2912 without the first report processed? Would you mind adding a unit test to make the problem more clear?
Lines 2905 to 2914 in 7ba2bd6
if (namesystem.isInStartupSafeMode() | |
&& !StorageType.PROVIDED.equals(storageInfo.getStorageType()) | |
&& storageInfo.getBlockReportCount() > 0) { | |
blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: " | |
+ "discarded non-initial block report from {}" | |
+ " because namenode still in startup phase", | |
strBlockReportId, fullBrLeaseId, nodeID); | |
blockReportLeaseManager.removeLease(node); | |
return !node.hasStaleStorages(); | |
} |
💔 -1 overall
This message was automatically generated. |
@Hexiaoqiao Should be able to merge, you see what else to optimize? |
Thanks @Tre2878 , Please check comments from reviewers and report from Yetus, if any concerns need to fix or give feedback before check in. Thanks again. |
💔 -1 overall
This message was automatically generated. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unit test cannot prove that the root cause of the problem is as described in your PR, because you manually call BlockReportLeaseManager#removeLease
in the UT code. In BlockManager#processReport
method, BlockReportLeaseManager#removeLease
will only be called after the first block report of the corresponding storage is processed successfully, which contradicts your description 'The first FBR times out because the namenode is busy'. Looking forward to your reply.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Don't know why the hadoop.hdfs.TestFileChecksum unit test failed, I didn't modify the code |
@zhangshuyan0 We rewrote the unit test to see if we explained the bug |
|
||
// Remove full block report lease about dn | ||
spyBlockManager.getBlockReportLeaseManager() | ||
.removeLease(datanodeDescriptor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem in this UT is the same as before, you still actively call removeLease
in the code, which doesn't seem to happen in real code. It remains confusing why removeLease
is called when the first block report was not successfully processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removeLease operation should be in the processReport method, so let me modify that,This is misleading
💔 -1 overall
This message was automatically generated. |
@zhangshuyan0 Do the new unit tests account for the bug? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for patiently modifying it with your precious time. It makes sense to me. Just leave some comments.
@@ -269,4 +272,84 @@ private StorageBlockReport[] createReports(DatanodeStorage[] dnStorages, | |||
} | |||
return storageBlockReports; | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need add a timeout here.
@@ -2904,7 +2908,8 @@ public boolean processReport(final DatanodeID nodeID, | |||
} | |||
if (namesystem.isInStartupSafeMode() | |||
&& !StorageType.PROVIDED.equals(storageInfo.getStorageType()) | |||
&& storageInfo.getBlockReportCount() > 0) { | |||
&& storageInfo.getBlockReportCount() > 0 | |||
&& totalReportNum == currentReportNum) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a datanode report twice during namenode safemode, the second report will be almost completely processed, which may extend startup time. How about modify code like this? This can also avoid changes in the method signature.
if (namesystem.isInStartupSafeMode()
&& !StorageType.PROVIDED.equals(storageInfo.getStorageType())
&& storageInfo.getBlockReportCount() > 0) {
blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: "
+ "discarded non-initial block report from datanode {} storage {} "
+ " because namenode still in startup phase",
strBlockReportId, fullBrLeaseId, nodeID, storageInfo.getStorageID());
boolean needRemoveLease = true;
for (DatanodeStorageInfo sInfo : node.getStorageInfos()) {
if (sInfo.getBlockReportCount() == 0) {
needRemoveLease = false;
}
}
if (needRemoveLease) {
blockReportLeaseManager.removeLease(node);
}
return !node.hasStaleStorages();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangshuyan0 Thank you for your retrial,This change can achieve the same effect, but I think node.hasStaleStorages() is also a Datanode-level operation that should also be called on the last disk, but logically, functionally, it's not that different。Listen to other people's opinions ,@Hexiaoqiao What do you think about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @zhangshuyan0's proposal better.
The following section of code can also be separated to be a function on its own.
// Remove the lease when we have received block reports for all storages for a particular DN.
void removeLease() {
for (DatanodeStorageInfo sInfo : node.getStorageInfos()) {
if (sInfo.getBlockReportCount() == 0) {
needRemoveLease = false;
}
}
if (needRemoveLease) {
blockReportLeaseManager.removeLease(node);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this will be a good solution with condition blockReportCount == 0
, consider that one disk failed but not checked in time. Will it affect this logic here? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two of the cases will come into this logic
- The namenode is restarting while receiving FBRS from all datanodes and is in safe mode
- When the namenode is in secure mode for some reason while it has been running for a long time
In the first case, if the datanode has a failed disk, the datanode will send the FBR for the normal disk and the namenode will handle it normally
In the second case, blockReportCount == 0 will always be false if no new disks are added to the datanode
So I recommend keeping the code as it is and not using blockReportCount == 0
💔 -1 overall
This message was automatically generated. |
@@ -1650,7 +1650,7 @@ public DatanodeCommand blockReport(final DatanodeRegistration nodeReg, | |||
final int index = r; | |||
noStaleStorages = bm.runBlockOp(() -> | |||
bm.processReport(nodeReg, reports[index].getStorage(), | |||
blocks, context)); | |||
blocks, context, reports.length, index + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding me of the shortcomings of my plan. I will try to improve it. However, the solution in this PR may not work. If datanode send one storage report per RPC, reports.length
will be 1 here. Your code totalReportNum == currentReportNum
will always be true. So block report lease will be removed as before. This repairing will be ineffective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. +1, We have to solve this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding me. I think we'll have to think of something else
@@ -2957,6 +2957,19 @@ public boolean processReport(final DatanodeID nodeID, | |||
return !node.hasStaleStorages(); | |||
} | |||
|
|||
// Remove the lease when we have received block reports for all storages for a particular DN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the lease when we have received block reports for all storages for a particular DN.
->
Remove the DN lease only when we have received block reports for all storages for a particular DN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. LGTM.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@Tre2878 Please check the code style according to yetus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a possible case which can cause NN to be stuck in safe mode. It does seem incorrect to prematurely remove lease without checking whether we have processed all block reports for all storages for a DN. Thanks for identifying this corner case and contributing the fix!
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once fix the checkstyles.
@@ -2957,6 +2957,22 @@ public boolean processReport(final DatanodeID nodeID, | |||
return !node.hasStaleStorages(); | |||
} | |||
|
|||
/** | |||
* Remove the DN lease only when we have received block reports | |||
* for all storages for a particular DN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix checkstyle.
FSNamesystem fsn = cluster.getNamesystem(); | ||
|
||
NameNode nameNode = cluster.getNameNode(); | ||
// pretend to be in safemode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first letter need to be uppercase and end with period at the end of sentence.
...p-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportLease.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
@Hexiaoqiao Thank you for your patience. I have corrected it. What else is wrong? |
Please check the blanks item and try to fix it as Yetus saud. #5855 (comment) Thanks. |
🎊 +1 overall
This message was automatically generated. |
@Hexiaoqiao Thank you for your patient guidance. Now All checks have passed |
boolean needRemoveLease = true; | ||
for (DatanodeStorageInfo sInfo : node.getStorageInfos()) { | ||
if (sInfo.getBlockReportCount() == 0) { | ||
needRemoveLease = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fast break here if meet sInfo.getBlockReportCount() == 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I do. Let me add it
💔 -1 overall
This message was automatically generated. |
@Hexiaoqiao The unit test error should have nothing to do with this change |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 from my side. Will check in if no more comments.
Committed to trunk. Thanks @Tre2878 for your contribution. And Thanks @xinglin @zhangshuyan0 for your reviews! |
…es report. (apache#5855). Contributed by Yanlei Yu. Reviewed-by: Shuyan Zhang <[email protected]> Reviewed-by: Xing Lin <[email protected]> Signed-off-by: He Xiaoqiao <[email protected]>
In our cluster of 800+ nodes, after restarting the namenode, we found that some datanodes did not report enough blocks, causing the namenode to stay in secure mode for a long time after restarting because of incomplete block reporting
I found in the logs of the datanode with incomplete block reporting that the first FBR attempt failed, possibly due to namenode stress, and then a second FBR attempt was made as follows:
....
2023-07-17 11:29:28,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x6237a52c1e817e, containing 12 storage report(s), of which we sent 1. The reports had 1099057 total blocks and used 1 RPC(s). This took 294 msec to generate and 101721 msecs for RPC and NN processing. Got back no commands.
2023-07-17 11:37:04,014 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block report 0x62382416f3f055, containing 12 storage report(s), of which we sent 12. The reports had 1099048 total blocks and used 12 RPC(s). This took 295 msec to generate and 11647 msecs for RPC and NN processing. Got back no commands.
There's nothing wrong with that. Retry the send if it fails But on the namenode side of the logic:
if (namesystem.isInStartupSafeMode()
&& !StorageType.PROVIDED.equals(storageInfo.getStorageType())
&& storageInfo.getBlockReportCount() > 0) {
blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: "
+ "discarded non-initial block report from {}"
+ " because namenode still in startup phase",
strBlockReportId, fullBrLeaseId, nodeID);
blockReportLeaseManager.removeLease(node);
return !node.hasStaleStorages();
}
When a disk was identified as the report is not the first time, namely storageInfo. GetBlockReportCount > 0, Will remove the ticket from the datanode, lead to a second report failed because no lease