-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-25625 StochasticBalancer CostFunctions needs a better way to ev… #3011
Conversation
…aluate resource distribution
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
test updates
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm excited for working towards a balancer that works better for large clusters! Thanks for proposing changes in that direction.
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
Show resolved
Hide resolved
} | ||
} | ||
// recalculate stdev | ||
reComputeRegionStDevPerTable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like here we should only be recomputing the stdev for the affected tableIndex
. I don't want to over-optimize if it makes the code less readable... but in clusters with large numbers of tables, we would be doing unnecessary work.
I'm thinking in the context of HBASE-22300 where large clusters no longer run through enough iterations of the balancer. In that case, the issue is number of regions, which will always be larger than number of tables... and likely much larger. So it may not be a major issue here, but I'm bringing it up just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bringing up this. reComputeRegionStDevPerTable() can be optimized to only compute the impacted table index to reduce computation cost.
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
Outdated
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java
Outdated
Show resolved
Hide resolved
...est/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancerLargeCluster.java
Show resolved
Hide resolved
testWithCluster(numNodes, numRegions, numRegionsPerServer, replication, numTables, true, true); | ||
|
||
// we need to capture the outlier and generate a move | ||
testWithCluster(numNodes, numRegions, numRegionsPerServer, replication, numTables, false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change of code suggests that the balancer will no longer fully balance the cluster in one run... why is that? Are we making the balancer less effective, even in this test with large cluster with one outlier node? I thought this is the type of case that the fix is supposed to improve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the side impact. the computation gets more complicated. I will work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out to be related to https://issues.apache.org/jira/browse/HBASE-25767. Will merge the fix.
🎊 +1 overall
This message was automatically generated. |
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
The new heuristic needs to be evaluated by simulation at different scales. so this pr is currently pending the validation. 3067 is a scaled back fix for table skew only as a sub-task https://issues.apache.org/jira/browse/HBASE-25739?filter=-2 |
Should this or the other PR have their own JIRA then @clarax Thanks. |
Close the pr for now since the other pr #3067 landed and relieved most of the problem. Will revisit when the problem arises again. |
…aluate resource distribution