-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-13274. RBF: Extend RouterRpcClient to use multiple sockets #4531
Conversation
💔 -1 overall
This message was automatically generated. |
@@ -1760,7 +1760,7 @@ UserGroupInformation getTicket() { | |||
return ticket; | |||
} | |||
|
|||
private int getRpcTimeout() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do the HADOOP changes in a separate PR with a HADOOP JIRA?
Something like "Allow socket factory for connections"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy, I will do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that's HADOOP-13144.
Can you do a PR there? I think we are safe merging that.
* @return the protocol proxy | ||
* @throws IOException if the far end through a RemoteException | ||
*/ | ||
public static <T> ProtocolProxy<T> getProtocolProxy(Class<T> protocol, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the javadocs.
@@ -57,6 +57,11 @@ <T> ProtocolProxy<T> getProxy(Class<T> protocol, | |||
SocketFactory factory, int rpcTimeout, | |||
RetryPolicy connectionRetryPolicy) throws IOException; | |||
|
|||
/** Construct a client-side proxy object using a ConnectionId. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add the full javadoc.
@goiri I have raised a new PR-4542 to push HADOOP-13144 forward. And after it, then I will continue push this issue forward. |
I left a few minor comments; it would be good for others to review this too. |
@goiri I have updated this patch base on the trunk, please help me reivew it again. Thanks~ |
🎊 +1 overall
This message was automatically generated. |
@@ -304,7 +343,7 @@ private void testConnectionCleanup(float ratio, int totalConns, | |||
addConnectionsToPool(pool, totalConns - 1, activeConns - 1); | |||
|
|||
// There are activeConn connections. | |||
// We can cleanup the pool | |||
// We can clean up the pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cleanup" is correct
ConnectionContext connection = new ConnectionContext(clientProxy); | ||
return connection; | ||
ProxyAndInfo<T> clientProxy = new ProxyAndInfo<T>(client, dtService, socket); | ||
return new ConnectionContext(clientProxy, maxConcurrencyPerConn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make it more generic and pass the conf
instead of ad-hoc arguments?
🎊 +1 overall
This message was automatically generated. |
@ayushtkn @ferhui @Hexiaoqiao Can you help me review this patch? |
@Override | ||
public int hashCode() { | ||
return new HashCodeBuilder() | ||
.append(PRIME * super.hashCode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it is necessary to multiply by PRIME for hashCode, because super hashCode has done that and the conflict probability is very low here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have deleted the PRIME, please help me review it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave one nit comment inline. Others look good to me. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1.
🎊 +1 overall
This message was automatically generated. |
...p-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionPool.java
Show resolved
Hide resolved
...p-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionPool.java
Show resolved
Hide resolved
<name>dfs.federation.router.enable.multiple.socket</name> | ||
<value>false</value> | ||
<description> | ||
If enable multiple downstream socket or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should explain the relation between dfs.federation.router.enable.multiple.socket and dfs.federation.router.max.concurrency.per.connection.
In general it would be good to have this in some of the RBF md files explaining why doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@@ -135,6 +135,13 @@ public class RBFConfigKeys extends CommonConfigurationKeysPublic { | |||
FEDERATION_ROUTER_PREFIX + "connection.clean.ms"; | |||
public static final long DFS_ROUTER_NAMENODE_CONNECTION_CLEAN_MS_DEFAULT = | |||
TimeUnit.SECONDS.toMillis(10); | |||
public static final String DFS_ROUTER_NAMENODE_ENABLE_MULTIPLE_SOCKET_KEY = | |||
FEDERATION_ROUTER_PREFIX + "enable.multiple.socket"; | |||
public static final boolean DFS_ROUTER_NAMENODE_ENABLE_MULTIPLE_SOCKET_DEFAULT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this fits in one line.
<value>1</value> | ||
<description> | ||
The maximum number of requests that a connection can handle concurrently. | ||
It's best used with dfs.federation.router.enable.multiple.socket together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dfs.federation.router.enable.multiple.socket=false does it even take effect?
Can we also give an example or a rule of thumb to what value would be good (moderate is a little ambiguous)?
@goiri Sir, thanks very much for your careful review, I have updated this patch, please help me review it again. Thanks |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
LGTM. It would be good to get a little more feedback from people working on this. |
Can we retrigger the build? |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@goiri Master, I rebased it on the latest branch and triggered a building. |
@goiri @Hexiaoqiao Master, thank you very much for helping me review this patch and very nice suggestion. |
Description of PR
Jira: HDFS-13274
HADOOP-13144 introduces the ability to create multiple connections for the same user and use different sockets. The RouterRpcClient should use this approach to get a better throughput.
In my practice, I configured
dfs.federation.router.connection.pool-size=6
and found some abnormal cases:After tracing the code and found that we can optimize the mechanism of round-robin obtaining connection.
ConnectionContext
can support handler some concurrent requests at the same time, such as 5ConnectionContext
usable when thenumThreads
is less thanmaxConcurrencyPerConn
getConnection
, we can returns the previous usable connection in orderAfter using the above solution, we effectively controlled the number of connections between RBF and NameNode, and there is almost no the phenomenon of creating sockets during forwarding requests.