-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-24765: Dynamic master discovery #2130
Conversation
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
8febac1
to
36fe586
Compare
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice patch, left some comments. PR should point to splittable meta feature branch?
.setServerName(ProtobufUtil.toServerName(name)).setIsActive(true).build())); | ||
// Backup masters | ||
try { | ||
// TODO: Cache the backup masters to avoid a ZK RPC for each getMasters() call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are planning to have a cache with ZKWatcher for backupMasters ZNode right? I believe as of now, we don't subscribe for any event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
}; | ||
masterAddrRefresherThread = Threads.newDaemonThreadFactory( | ||
"MasterRegistry refresh end-points").newThread(masterEndPointRefresher); | ||
masterAddrRefresherThread.start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't want to use SingleThreadExecutor.submit()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok switched. I didn't want to have extra layers on top of a simple thread, but I guess a pool is more readable.
masterAddrRefresherThread = Threads.newDaemonThreadFactory( | ||
"MasterRegistry refresh end-points").newThread(masterEndPointRefresher); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please use guava library's ThreadFactoryBuilder? So far the consensus on HBASE-24750 is to get rid of our internally maintained ThreadFactory :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok switched. I think we should use our internal one, but fine.
} | ||
}; | ||
masterAddrRefresherThread = Threads.newDaemonThreadFactory( | ||
"MasterRegistry refresh end-points").newThread(masterEndPointRefresher); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: avoid space in Thread prefix name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean between words?
registry.getParsedMasterServers().size() == 2); | ||
final Set<ServerName> newMasters2 = registry.getParsedMasterServers(); | ||
assertEquals(2, newMasters2.size()); | ||
assertFalse(newMasters2.contains(activeMaster)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newMasters2.contains(activeMaster.getServerName())
TEST_UTIL.waitFor(5000, | ||
(Waiter.Predicate<Exception>) () -> !registry.getParsedMasterServers().equals(masters)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: upto you if you want to use ExplainingPredicate
to throw Exception with specific message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipped it because it bloats up the code, we need to expand the lambda. I think the intent there is pretty clear if the test fails. Let me know if you feel strongly, I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure no worries, not a strong point.
TEST_UTIL.waitFor(100000, (Waiter.Predicate<Exception>) () -> | ||
registry.getParsedMasterServers().size() == 2); | ||
final Set<ServerName> newMasters2 = registry.getParsedMasterServers(); | ||
assertEquals(2, newMasters2.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After stopping activeMaster, maybe add an extra check to confirm list contains one Active and one Backup?
// RPC has failed, trigger a refresh of master end points. We can have some spurious | ||
// refreshes, but that is okay since the RPC is not expensive and not in a hot path. | ||
synchronized (refreshMasters) { | ||
refreshMasters.notify(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For any generic RPC failure, we want to expedite populating masters with another RPC call.
Let's say there are some sequence of events:
- getClusterId() RPC call failed
- master refresher thread was in
waiting
state, so we notify it and it will trigger getMasters() call - the call fails again and we
notify
refreshMasters but no one is waiting on it, notify is ignored - master refresher thread again waits for 5 min before populating masters.
Do we really want step 4 to wait for 5 min (assuming no other RPC call happens and masters list is stale)? Maybe we can expedite populating masters with the help of AtomicBoolean check (and also avoid synchronized + wait
calls i.e 5 min wait)?
Even if we have network issue, we don't want to delay populate masters by 5 min right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can expedite populating masters with the help of AtomicBoolean check
Not sure I follow this, mind rephrasing?
assuming no other RPC call happens and masters list is stale
If not other RPC call happens, it doesn't matter if the list is stale or not?
Even if we have network issue, we don't want to delay populate masters by 5 min right?
Not sure I follow, if we have a network issue, how can we populate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can expedite populating masters with the help of AtomicBoolean check
Not sure I follow this, mind rephrasing?
I meant to say if refresh thread misses this notify
because it is already done waiting
on refreshMasters
, for the next loop, it should not again wait 5 min on refreshMasters
and rather quickly perform RPC call to populate masters.
Even if we have network issue, we don't want to delay populate masters by 5 min right?
Not sure I follow, if we have a network issue, how can we populate?
I meant same as above that even if network issue causes notify
to refresh thread when it was already past waiting
state, maybe next time the thread better quickly make an RPC call rather than waiting 5 min on refreshMasters
. But yes, for network issues, we will keep making RPC calls without any progress.
If not other RPC call happens, it doesn't matter if the list is stale or not?
Hmm that's true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to say if refresh thread misses this notify because it is already done waiting on refreshMasters, for the next loop, it should not again wait 5 min on refreshMasters and rather quickly perform RPC call to populate masters.
I don't think thats needed. If the thread has just fetched the masters (in cases where it missed the notification), it is very unlikely that something new has been added/removed. Typically this is a very rare event, probably less rare in K8s environment than DC deployments but even then I don't think things usually change for days if not weeks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR should point to splittable meta feature branch?
No, this has nothing to do with splittable meta. @saintstack just pointed out this optimization in the design doc that I thought would be generally useful to any who wants to use master registry. This was also discussed in the parent jira HBASE-18095 too.
The goal of the patch is keep the client's local master end points up-to-date without operators having to worry about refreshing the client configuration after adding/deleting new master roles. This optimization would be useful generally.
I will address the other comments once we iron out the design and everyone is okay with the approach.
// RPC has failed, trigger a refresh of master end points. We can have some spurious | ||
// refreshes, but that is okay since the RPC is not expensive and not in a hot path. | ||
synchronized (refreshMasters) { | ||
refreshMasters.notify(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can expedite populating masters with the help of AtomicBoolean check
Not sure I follow this, mind rephrasing?
assuming no other RPC call happens and masters list is stale
If not other RPC call happens, it doesn't matter if the list is stale or not?
Even if we have network issue, we don't want to delay populate masters by 5 min right?
Not sure I follow, if we have a network issue, how can we populate?
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saintstack just pointed out this optimization in the design doc that I thought would be generally useful to any who wants to use master registry.
Yes, sounds good. Thanks @bharathv .
// RPC has failed, trigger a refresh of master end points. We can have some spurious | ||
// refreshes, but that is okay since the RPC is not expensive and not in a hot path. | ||
synchronized (refreshMasters) { | ||
refreshMasters.notify(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can expedite populating masters with the help of AtomicBoolean check
Not sure I follow this, mind rephrasing?
I meant to say if refresh thread misses this notify
because it is already done waiting
on refreshMasters
, for the next loop, it should not again wait 5 min on refreshMasters
and rather quickly perform RPC call to populate masters.
Even if we have network issue, we don't want to delay populate masters by 5 min right?
Not sure I follow, if we have a network issue, how can we populate?
I meant same as above that even if network issue causes notify
to refresh thread when it was already past waiting
state, maybe next time the thread better quickly make an RPC call rather than waiting 5 min on refreshMasters
. But yes, for network issues, we will keep making RPC calls without any progress.
If not other RPC call happens, it doesn't matter if the list is stale or not?
Hmm that's true.
|
||
// RPC client used to talk to the masters. | ||
private final RpcClient rpcClient; | ||
private final RpcControllerFactory rpcControllerFactory; | ||
private final int rpcTimeoutMs; | ||
// For synchronizing on refreshing the master end-points | ||
private final Object refreshMasters = new Object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: static final
here? Anyways, MasterRegistry
is singleton right? (if not by design, but by usage)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's loosely tied to a single connection (there are some places that create on the fly registries which is something that can be fixed).
A single application can connect to multiple clusters which means we cannot make it static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yes. This better be just final.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyone else has any thoughts on the approach this patch takes? Implementation is pretty straight forward but happy to add more detail if needed.
// RPC has failed, trigger a refresh of master end points. We can have some spurious | ||
// refreshes, but that is okay since the RPC is not expensive and not in a hot path. | ||
synchronized (refreshMasters) { | ||
refreshMasters.notify(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to say if refresh thread misses this notify because it is already done waiting on refreshMasters, for the next loop, it should not again wait 5 min on refreshMasters and rather quickly perform RPC call to populate masters.
I don't think thats needed. If the thread has just fetched the masters (in cases where it missed the notification), it is very unlikely that something new has been added/removed. Typically this is a very rare event, probably less rare in K8s environment than DC deployments but even then I don't think things usually change for days if not weeks.
@@ -121,6 +121,11 @@ public boolean hasCellBlockSupport() { | |||
@Override | |||
public void callMethod(MethodDescriptor method, RpcController controller, Message request, | |||
Message responsePrototype, RpcCallback<Message> done) { | |||
if (!method.getName().equals("GetClusterId")) { | |||
// Master registry internally runs other RPCs to keep the master list up to date. This check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more here? Why is it that the internal RPCs that keep the master list up to date are sufficient to skip a call to "GetClusterId"? Can you provide a "see also" comment that points the reader off to the counting logic, or at least the counter that this condition protects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add more detail. That is needed because of the way the test is written. This RpcChannel implementation intercepts all the mock RPCs from unit tests and the just counts the getClusterId calls (depending on the index).. With the patch a single GetClusterID() RPC failure can trigger an extra getMasters() call and that is accounted too.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MasterRegistry.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MasterRegistry.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MasterRegistry.java
Outdated
Show resolved
Hide resolved
refreshMasters.wait(WAIT_TIME_OUT_MS); | ||
} | ||
LOG.debug("Attempting to refresh master address end points."); | ||
Set<ServerName> newMasters = new HashSet<>(getMasters().get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would I find metrics regarding calls to getMasters()
? I suppose either client or server-side would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought about this. Are you talking about this RPC specifically? If so, may I ask why?
Adding it on the server makes sense to me, like we want some metrics around where most of the time is spent (grouped by RPC) but curious what purpose it serves on client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone who has the HBase client embedded in their application might be interested in observing this behavior, if they notice sudden spikes in RPC traffic not directly correlated with their application's data path.
Or maybe it's enough to track these calls on the server-side. We'd see the same spikes, though with less granularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Looks like we have all the plumbing for client metrics already in place via HBASE-12911. I can add metrics for this RPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either client-side or server-side is fine with me. Just having something that an operator can expose would be useful.
36fe586
to
f7ae049
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MasterRegistry.java
Outdated
Show resolved
Hide resolved
f7ae049
to
fb34651
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added metrics, some more unit test coverage and refactored the code slightly.
💔 -1 overall
This message was automatically generated. |
fb34651
to
14eae44
Compare
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
e46dfce
to
6cd0287
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think this should be done in this patch, as now this method will replace the old getActiveMaster method, which makes it not only be used in our internal refresh but also be used by end users, we should not let users still have the ability to harmmer zookeeper...
I have a separate patch for this, didn't want to scope creep this one. Mind if do a separate PR? I will back port them together.
Ended up implementing caching too in the same patch, for completeness sake.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Apache9 Any more comments on this one? Thanks.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MasterAddressRefresher.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MasterRegistry.java
Outdated
Show resolved
Hide resolved
6cd0287
to
f0c146f
Compare
@Apache9 / @virajjasani Any more comments or is this good to go? This has been open for a while, would like to get it in asap unless you have any comments. |
🎊 +1 overall
This message was automatically generated. |
The only concern is about the IllegalStateException. We used to throw IOException for this case. And please fix the checkstyle issues? |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nits, +1 overall.
periodicRefreshMs = 1000 * conf.getLong(PERIODIC_REFRESH_INTERVAL_SECS, | ||
PERIODIC_REFRESH_INTERVAL_SECS_DEFAULT); | ||
timeBetweenRefreshesMs = 1000 * conf.getLong(MIN_SECS_BETWEEN_REFRESHES, | ||
MIN_SECS_BETWEEN_REFRESHES_DEFAULT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: TimeUnit.SECONDS.toMillis(conf.getLong(,))
private static final int PERIODIC_REFRESH_INTERVAL_SECS_DEFAULT = 300; | ||
public static final String MIN_SECS_BETWEEN_REFRESHES = | ||
"hbase.client.master_registry.min_secs_between_refreshes"; | ||
private static final long MIN_SECS_BETWEEN_REFRESHES_DEFAULT = 60; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: keep this int
similar to PERIODIC_REFRESH_INTERVAL_SECS_DEFAULT
?
registry.populateMasterStubs(newMasters); | ||
LOG.debug("Finished refreshing master end points. {}", newMasters); | ||
} catch (InterruptedException e) { | ||
LOG.debug("Interrupted during wait, aborting refresh-masters-thread.", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are aborting refresh by breaking out of the loop and basically, we are done refreshing master stubs. Better to log this at ERROR
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That also happens during a regular pool shutdown. Added another log later in the method after loop exist for observability.
zkw.getZNodePaths().backupMasterAddressesZNode); | ||
} catch (KeeperException e) { | ||
LOG.warn(zkw.prefix("Unable to list backup servers"), e); | ||
backupMasterStrings = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: return Collections.emptyList();
Keeping backupMasters volatile list up-to date in ActiveMasterManager is nice move with this PR. |
🎊 +1 overall
This message was automatically generated. |
I added a checked exception. TBH I don't fully understand the concern. Like I explained it happens only if an invariant fails, which means a code bug, so unchecked exception should be fine. Either way its not a big deal what exception we throw, so made the changed. Also, fixed the check style issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
This patch adds the ability to discover newly added masters dynamically on the master registry side. The trigger for the re-fetch is either periodic (5 mins) or any registry RPC failure. Master server information is cached in masters to avoid repeated ZK lookups. Updates the client side connection metrics to maintain a counter per RPC type so that clients have visibility into counts grouped by RPC method name. I didn't add the method to ZK registry interface since there is a design discussion going on in splittable meta doc. We can add it later if needed. Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Viraj Jasani <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
97375bc
to
275a38e
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
This patch adds the ability to discover newly added masters
dynamically on the master registry side. The trigger for the
re-fetch is either 5mins or any registry RPC failure.
I didn't add the method to ZK registry interface since there
is a design discussion going on in splittable meta doc. We can
add it later if needed.