-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17646. Add Option to limit Balancer overUtilized nodes num in each iteration. #7120
Conversation
@haiyang1987 Could you review this PR? Thank you very much! |
Thanks for the reply. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Only one concerns, if give one small limitTopNodes is it possible to impact DataNode service when trigger balancer. Such as one corner case, given set limitTopNodes=1 and there are thousands underUtilized nodes, then the top one datanode will involve transfer flood, right? So IMO we should give some caution information to end user at least. FYI. Thanks.
@@ -452,6 +456,7 @@ private long init(List<DatanodeStorageReport> reports) { | |||
} | |||
} | |||
|
|||
int overUtilizedDiffNum = Math.max(overUtilized.size() - limitTopNodesNum, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get purpose of this changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimized code:
// limit the number of OverUtilized nodes
// If excludedOverUtilizedNum is greater than 0, The overUtilized is limited
int excludedOverUtilizedNum = Math.max(overUtilized.size() - limitOverUtilizedNum, 0);
@@ -461,27 +466,31 @@ private long init(List<DatanodeStorageReport> reports) { | |||
metrics.setNumOfUnderUtilizedNodes(underUtilized.size()); | |||
|
|||
Preconditions.checkState(dispatcher.getStorageGroupMap().size() | |||
== overUtilized.size() + underUtilized.size() + aboveAvgUtilized.size() | |||
+ belowAvgUtilized.size(), | |||
== overUtilizedDiffNum + overUtilized.size() + underUtilized.size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a. overUtilized.size()
= overUtilizedDiffNum + overUtilized.size()
? so is it need to change here?
b. Please align the format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes, excluded (overUtilized.size() - limitOverUtilizedNum) OverUtilizedNodes number.
- Optimize the assertion logic as follows
Preconditions.checkState(dispatcher.getStorageGroupMap().size() - excludedOverUtilizedNum
== overUtilized.size() + underUtilized.size() + aboveAvgUtilized.size()
+ belowAvgUtilized.size(),
list.sort( | ||
(Source source1, Source source2) -> | ||
(Double.compare(overUtilizedPercentage.get(source2), | ||
overUtilizedPercentage.get(source1))) | ||
); | ||
int size = overUtilized.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just suggest to add another one separate function rather than located here which is sort
something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix
// Create under utilized nodes | ||
for (int i = 0; i < numOfUnderUtilizedDn; i++) { | ||
// Bring one node alive | ||
DataNodeTestUtils.triggerHeartbeat(dataNodes.get(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here write some data in dataNodes[0] and dataNode[1] again? Not sure if the capacityUsed percent is as expected. Would you mind to give double check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix
a2f967d
to
019aa31
Compare
Adds log info to end user. Similar to using the '-source' param, top one datanode will involve transfer flood, this corner case will continue to migrate data from the node until the conditions for Balance are met. |
019aa31
to
54a6488
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
54a6488
to
8281129
Compare
💔 -1 overall
This message was automatically generated. |
8281129
to
574a32a
Compare
@@ -110,6 +113,10 @@ boolean getSortTopNodes() { | |||
return this.sortTopNodes; | |||
} | |||
|
|||
int getlimitOverUtilizedNum() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getLimitOverUtilizedNum()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix
Preconditions.checkState(dispatcher.getStorageGroupMap().size() | ||
== overUtilized.size() + underUtilized.size() + aboveAvgUtilized.size() | ||
+ belowAvgUtilized.size(), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix checkstytle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, IDEA automatically removes spaces, resubmit again.
Thanks @huangzhaobo99 work. a small suggestion when you push new commits please just push without using force push so that the pull request's history is preserved. thanks~ |
@haiyang1987 Thanks for the review and suggestion. |
337dbe6
to
508cd62
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@Hexiaoqiao @haiyang1987 Hi, has been fixed. Can you please review it again when you have time? Thanks~~! |
@@ -120,12 +127,12 @@ public String toString() { | |||
+ " max idle iteration = %s," + " #excluded nodes = %s," | |||
+ " #included nodes = %s," + " #source nodes = %s," | |||
+ " #blockpools = %s," + " run during upgrade = %s," | |||
+ " sort top nodes = %s," | |||
+ " sort top nodes = %s," + " limit overUtilized nodes num = %s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ " sort top nodes = %s," + " limit overUtilized nodes num = %s,"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
} else if ("-limitOverUtilizedNum".equalsIgnoreCase(args[i])) { | ||
Preconditions.checkArgument(++i < args.length, | ||
"limitOverUtilizedNum value is missing: args = " + Arrays.toString(args)); | ||
int limitNum = Integer.parseInt(args[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we need to check if limitNum
is less than 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, 0 is legal, non-negative numbers are okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, restricting the list that is greater than the threshold, and in the future, we will also need to restrict the list that is greater than the average (aboveAvgUtilized), 0 is legal.
Leave some small comments ,and please fix checkstytle again, thanks~ |
This is consistent with the previous design, and there is a warning in the checkstyle of the pipeline. |
💔 -1 overall
This message was automatically generated. |
Please check and fix failed unit tests and checkstyle. Others make sense to me. |
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@Hexiaoqiao @haiyang1987 Hello sir, can you help merge it into the trunk branch? I want to continue optimizing other related functions of the balancer tool. Thanks~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1.
Committed to trunk. Thanks @huangzhaobo99 and @haiyang1987 ! |
Thanks~ |
Description of PR
Limit the maximum number of overutilized datanodes to avoid excessive nodes affecting cluster stability.
https://issues.apache.org/jira/browse/HDFS-17646
How was this patch tested?
Add UT