-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17165. Implement service-user feature in DecayRPCScheduler. #2240
Conversation
💔 -1 overall
This message was automatically generated. |
The failed test succeeded in my local computer. |
@@ -483,6 +501,12 @@ private void recomputeScheduleCache() { | |||
|
|||
for (Map.Entry<Object, List<AtomicLong>> entry : callCosts.entrySet()) { | |||
Object id = entry.getKey(); | |||
// The priority for service users is always 0 | |||
if (isServiceUser((String)id)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tasanuma Thanks for your proposal. I am concerned any corner case here if put service users's request to the priority forever rather than flow controls. Such as user hdfs
is one service user, and submit one big job which involve massive RPC request to NameNode, other normal users' request may postpone serious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment, @Hexiaoqiao.
I think you can adjust the existing FairCallQueue settings to deal with that case to a certain extent. You could consider increasing the weight of the highest priority queue, or extending the length of the highest priority queue. However, this simple feature is not a feature that can be used in all cases. If more fairness is needed among service users, then I believe that HADOOP-15016 would be required.
The case where I think this feature would be useful is when you have service users who are regularly running critical ETL jobs. A user who can make huge RPC requests at a moment may not be suite for a service user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tasanuma curious why you choose to do it here and cachedOrComputedPriorityLevel
instead of getPriorityLevel
in the previous patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sunchao Thanks for your review.
- When DecayRpcScheduler decays, it calls recomputeScheduleCache to recalculate the priority of all users, including service users, and put them in the cache. So we had to do this in recomputeScheduleCache as well. Lines 415-421 of the unit test cover this case.
- I moved it from getPriorityLevel to cachedOrComputedPriorityLevel because I thought it would be a little more efficient to determine if a user is a service user after looking at the cache information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Thanks @tasanuma . This looks most good for me. One minor nit is that the logic for handling service users now is separated into two places. Another way perhaps is to add this in computePriorityLevel
which is called by recomputeScheduleCache
and cacheOrComputePriorityLevel
. We need to add an extra parameter for this method though. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review, @sunchao! I agreed. Updated the PR addressing it.
I've read the design documentation for HADOOP-15016 and I believe that HADOOP-17165 could co-exist with HADOOP-15016.
|
@sunchao Could you review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tasanuma . Sorry I just noticed this PR.
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one nit. Thanks @tasanuma ! @Hexiaoqiao let me know if you want to take another look.
@@ -178,6 +188,7 @@ | |||
private static final double PRECISION = 0.0001; | |||
private MetricsProxy metricsProxy; | |||
private final CostProvider costProvider; | |||
private Set<String> serviceUsernames; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit nit: usually it is UserName
instead of Username
, so better follow that pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. Updated PR following UserName
.
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged. Thanks @tasanuma ! |
Thanks for reviewing and merging it, @sunchao and @Hexiaoqiao! |
JIRA: https://issues.apache.org/jira/browse/HADOOP-17165
This PR is following the HADOOP-17165.
Diff from HADOOP-17165.002.patch