-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16671. RBF: RouterRpcFairnessPolicyController supports configurable permit acquire timeout #4597
Conversation
💔 -1 overall
This message was automatically generated. |
…ble permit acquire timeout
💔 -1 overall
This message was automatically generated. |
public void init(Configuration conf) { | ||
this.permits = new HashMap<>(); | ||
long timeout = conf.getLong(DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT_KEY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the unit? We should do getTimeDuration()
int totalDedicatedHandlers = 0; | ||
for (String nsId : allConfiguredNS) { | ||
int dedicatedHandlers = | ||
conf.getInt(DFS_ROUTER_FAIR_HANDLER_COUNT_KEY_PREFIX + nsId, 0); | ||
conf.getInt(DFS_ROUTER_FAIR_HANDLER_COUNT_KEY_PREFIX + nsId, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it one line as we are touching this.
🎊 +1 overall
This message was automatically generated. |
public void init(Configuration conf) { | ||
this.permits = new HashMap<>(); | ||
long timeoutMs = conf.getTimeDuration(DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT_MS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do getTimeDuration() we don't need the prefix in the key DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT_MS.
We should set the default in the XML to "1s"
@@ -723,6 +723,14 @@ | |||
</description> | |||
</property> | |||
|
|||
<property> | |||
<name>dfs.federation.router.fairness.acquire.timeout.ms</name> | |||
<value>1000</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1s and remove the ms
public void testAcquireTimeout() { | ||
Configuration conf = createConf(40); | ||
conf.setInt(DFS_ROUTER_FAIR_HANDLER_COUNT_KEY_PREFIX + "ns1", 30); | ||
conf.setLong(DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT_MS, 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setTimeDuration(DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT, 1, TimeUnit.SECONDS)
@goiri Thanks for you review, I learned a lot. Please help me review this patch, thanks. |
🎊 +1 overall
This message was automatically generated. |
@@ -354,6 +354,10 @@ public class RBFConfigKeys extends CommonConfigurationKeysPublic { | |||
NoRouterRpcFairnessPolicyController.class; | |||
public static final String DFS_ROUTER_FAIR_HANDLER_COUNT_KEY_PREFIX = | |||
FEDERATION_ROUTER_FAIRNESS_PREFIX + "handler.count."; | |||
public static final String DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT = | |||
FEDERATION_ROUTER_FAIRNESS_PREFIX + "acquire.timeout"; | |||
public static final long DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT_MS_DEFAULT = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name should be similar to the original config name + _DEFAULT
, remove the _MS_
in middle.
Should be DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT_DEFAULT
if (timeoutMs >= 0) { | ||
acquireTimeoutMs = timeoutMs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is an invalid entry configured and we are moving to using the default value. We should atleast have a warn log. Kind of Invalid value -1 configured for dfs.... should be greater than or equal to 0, Using default value of : 1s instead
something like this
|
||
// There are some other operations, so acquireTimeMs >= 100ms. | ||
assertTrue(acquireTimeMs >= 100); | ||
assertTrue(acquireTimeMs < 100 + 50); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this 50 added as safe margin, kind of that other operations won't take more than 50ms? In that case I doubt this test is gonna go flaky in future, Double check to confirm it isn't gonna cross the safe limit of 50 ever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayushtkn Thanks for your review. About this, do you have some good ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either we can keep this safe margin way above or remove this. @goiri do you have any suggestions?
🎊 +1 overall
This message was automatically generated. |
@ayushtkn I just deleted Please help me review it again and push it forward. Thanks |
🎊 +1 overall
This message was automatically generated. |
@@ -723,6 +723,14 @@ | |||
</description> | |||
</property> | |||
|
|||
<property> | |||
<name>dfs.federation.router.fairness.acquire.timeout</name> | |||
<value>1s</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this configuration confirmed to be like this?
I feel the following is correct
<value>1</value>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, your configuration is accurate.
Merged! @ZanderXu Thanks for your contribution. @goiri @ayushtkn @slfan1989 Thanks for your reviews! |
…ble permit acquire timeout (apache#4597)
…ble permit acquire timeout (apache#4597)
RouterRpcFairnessPolicyController supports configurable permit acquire timeout. Hardcode 1s is very long, and it has caused an incident in our prod environment when one nameserivce is busy.
And the optimal timeout maybe should be less than p50(avgTime).
And all handlers in RBF is waiting to acquire the permit of the busy ns.