-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16452. msync RPC should send to acitve namenode directly #3976
base: trunk
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
LOG.info("active proxy is choosed :{}", current); | ||
break; | ||
} else { | ||
changeProxy(current); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, the current proxy in observer proxy provider in general should be pointing to Observer namenode, you seems to change the current proxy itself to active here? What would happen to subsequent calls, will they go to active or the concurrent calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @ayushtkn . the subsequent calls will not go to active.
@@ -342,7 +342,7 @@ private synchronized void initializeMsync() throws IOException { | |||
if (msynced) { | |||
return; // No need for an msync | |||
} | |||
getProxyAsClientProtocol(failoverProxy.getProxy().proxy).msync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does failoverProxy.getProxy().proxy points to?
the same logic seems to be used for invocation as well
// Either all observers have failed, observer reads are disabled,
// or this is a write request. In any case, forward the request to
// the active NameNode.
LOG.debug("Using failoverProxy to service {}", method.getName());
ProxyInfo<T> activeProxy = failoverProxy.getProxy();
try {
retVal = method.invoke(activeProxy.proxy, args);
} catch (InvocationTargetException e) {
// This exception will be handled by higher layers
throw e.getCause();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi, @ayushtkn , thanks a lot for replying.
In class ObserverReadProxyProvider, there are two varaibles used to get NameNode proxy. One is failoverProxy, another is nameNodeProxies. For read requests, it will use nameNodeProxies to get a proxy of observer namenode, but for msync rpc, it will use varaible failoverProxy to get an active namenode proxy.
In practice, we found a lots of exceptions in the observer namenode log like following:
org.apache.hadoop.hdfs.protocol.ClientProtocol.msync from 10.xxx.xxx.xxx:59292 org.apache.hadoop.ipc.ObserverRetryOnActiveException: Operation category WRITE is not supported in state observer. Visit https://s.apache.org/sbnn-error
we found that failoverProxy contains proxys of all kinds of namenode, so we want to find active namenode proxy before invoke msync rpc. There has no effect on sending read requests to observer namenode, because read request use nameNodeProxies find observer namenode rather than failoverProxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayushtkn
failoverProxy point to a random namenode of all. But in fact, we don't want it to point to observer namenode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xkrogen , hi, erik, could you please help me look at this ? i saw the original code is yours
Quick disclaimer, it has been a while since I have looked at any proxy-provider code. This PR looks in the wrong direction to me. As you mentioned, You shared logs that some standby NNs are contacted before the active is found. I guess you are using Your new implementation is trying to scan through the NameNodes and check their status to find the active, but this seems to be breaking the contract with If you want to change the active/standby determination, you should change the behavior in your |
@xkrogen , very very thanks for replying me so carefully. public ObserverReadProxyProvider(
Configuration conf, URI uri, Class<T> xface, HAProxyFactory<T> factory) {
this(conf, uri, xface, factory,
new ConfiguredFailoverProxyProvider<>(conf, uri, xface, factory));
} In our implementation, we compute active namenode proxy to perform msync. But it does not effect the subsequent read rpc request to contact with observer namenode, because the code logic is below: if (observerReadEnabled && shouldFindObserver() && isRead(method)) {
// our code choose active nn to perform msync
if (!msynced) {
initializeMsync();
} else {
autoMsyncIfNecessary();
}
// .....
// code below will choose observer nn to perform read rpc
for (int i = 0; i < nameNodeProxies.size(); i++) {
NNProxyInfo<T> current = getCurrentProxy();
HAServiceState currState = current.getCachedState();
if (currState != HAServiceState.OBSERVER) {
if (currState == HAServiceState.ACTIVE) {
activeCount++;
} else if (currState == HAServiceState.STANDBY) {
standbyCount++;
} else if (currState == null) {
unreachableCount++;
}
changeProxy(current);
continue;
}
try {
retVal = method.invoke(current.proxy, args);
lastProxy = current;
return retVal;
} catch (InvocationTargetException ite) {
// ...
}
}
} as the code show above, if we enter the if condition statement,it proves that a read rpc request is invoked. first we confirm which namenode is active and use active namenode to perform msync. After msync rpc, we get observer namenode proxy from nameNodeProxies to perfom read rpc request. In actually, there is no need to send msync to observer namenode and then failover, it is a waste of network. |
I agree with you that the current logic has RPCs that are theoretically unnecessary. I would be very open to some approach that allows |
hi, @xkrogen . thanks a lot. I agree with you. I will figure out the better solution as i can. we can discuss it later. very thanks |
hi, @xkrogen . So sorry for disturbing you. I have a new idea about what we have discussed. |
hi,@xkrogen, could you please help me look at this problem? thanks a lot. |
Sorry for the delay in my response @hfutatzhanghb. I want to make sure I understand your proposal fully. I think what you are saying is this: Currently when using Do I have that right? If so, I think this is a reasonable proposal. I believe our installations at LinkedIn also assume that ONNs don't participate in SbNN/ANN failover transitions. |
f536254
to
69d8a07
Compare
💔 -1 overall
This message was automatically generated. |
69d8a07
to
a26f74d
Compare
💔 -1 overall
This message was automatically generated. |
172ca09
to
43503f2
Compare
💔 -1 overall
This message was automatically generated. |
43503f2
to
ff89675
Compare
💔 -1 overall
This message was automatically generated. |
57d8987
to
8d73c6d
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
8d73c6d
to
1c3755a
Compare
💔 -1 overall
This message was automatically generated. |
1c3755a
to
b329d97
Compare
@xkrogen Hi,Erik. Could you please help me to review the code. thanks a lot. |
💔 -1 overall
This message was automatically generated. |
b329d97
to
16c36d5
Compare
🎊 +1 overall
This message was automatically generated. |
hi, @ayushtkn . could you please help me to review the code? thanks a lot. |
Description of PR
In current ObserverReadProxyProvider implementation, we use the following code to invoke msync RPC.
But msync RPC maybe send to Observer NameNode in this way, and then failover to Active NameNode. This can be avoid by applying this patch.