Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add a shared p2p rpc client session pool to reduce rpc setup time #3152

Merged

Conversation

sdbondi
Copy link
Member

@sdbondi sdbondi commented Aug 1, 2021

Description

Adds an RPC client session pool to efficiently maintain multiple shared
RPC sessions on the p2p network.

Motivation and Context

Wallet currently establishes one or more RPC sessions per transaction. At times, during stress tests these
sessions are opened and closed many times. Each time an RPC sessions is established, protocol negotiation
and RPC handshakes must occur. These components can make use of the shared RPC client pool
to more efficiently and easily manage multiple sessions without creating situations where the target peer is
thrashed with 10s of thousands of sessions in a stress test situation.

Implementing this in the wallet is TODO.

Some RPC test utilities were moved around to make them useful to other tests, which has added some LOC

How Has This Been Tested?

Extensive unit tests

Checklist:

  • I'm merging against the development branch.
  • I have squashed my commits into a single commit.

@sdbondi sdbondi force-pushed the comms-rpc-connection-pool branch 3 times, most recently from a0b082b to 2c4e82c Compare August 2, 2021 05:36
/// Creates a new RpcClientPool that can be shared between tasks. The client pool will lazily establish up to
/// `max_sessions` sessions and provides client session that is least used.
#[cfg(feature = "rpc")]
pub fn create_rpc_client_pool<T>(&self, max_sessions: usize) -> RpcClientPool<T>
Copy link
Contributor

@hansieodendaal hansieodendaal Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help me understand this:
I would have thought that calls to pub async fn connect_rpc and pub async fn connect_rpc_using_builder would call pub fn create_rpc_client_pool if a pool does not exist end then return connections from the pool only? Is that not the purpose of the pool?

Copy link
Member Author

@sdbondi sdbondi Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the thinking but unfortunately it's not that easy to do given that all clients have a different type.

Remember a pool session is per peer per protocol.

A pool of sessions has to be typed to an rpc protocol i.e. RpcClientPool<WalletBaseNodeClient>. So storing a generic list containing different types requires RpcClientPool<Box<dyn RpcPoolClient>> which actually requires setting up a BoxedRpcClientPool type. So basically, making this generic is not straight forward - I also don't think it's a particularly good solution, because:

  1. RPC is designed to for single concerns - one concern needs one kind of rpc service. Services are small and focused. You shouldn't need to open more than one/two sessions to a particular protocol i.e. you should design your software to not need more than this (though in reality you have quite a margin for abuse as we've seen).
  2. it is MUCH easier for the caller to get a pool and manage it, then to make a boxed dynamic list/hashmap. The domain knows about the pool it needs.
  3. Only the wallet needs a pool currently.
  4. How big of a pool do you need? Is one size fits all a good decision?
  5. It has a memory and perf cost which is simply not necessary for most components to pay
  6. Other minor concerns which add undue code complexity - When can we close the pooled sessions? Do we assume that we are never done using the pool? Do we add another call(s) to manage this pool state for everyone?

So, the wallet will get a pool, and then use it to get the connections it needs. In future, other components may need a pool so they have that available. It really depends on the lifetime of your session and requirements so it's probably a mistake to try and implement something that covers all possible domain requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks, and agreed. The caller/client process can/should manage the sessions.

philipr-za
philipr-za previously approved these changes Aug 2, 2021
Copy link
Contributor

@philipr-za philipr-za left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Is the plan that the domain level clients will maintain one of these for a given peer connection and recreate it if the peer connection drops? So they will handle the re-dialing if required. I can see that the maintenance of the peer connection could become a bit tricky.

@sdbondi
Copy link
Member Author

sdbondi commented Aug 2, 2021

@philipr-za Yeah, this PR takes care of the pooled management of a particular RPC session. I am implementing a domain-level component that manages the connection to a base node, which should be a matter of dialling it again if it drops and creating the shared pool. Callers can call on this to get a session from the pool without being exposed to managing the connection or needing to know about base node changes etc. Maybe that component can be made generic, but that is not the goal at the moment

@philipr-za
Copy link
Contributor

Looks good.

Is the plan that the domain level clients will maintain one of these for a given peer connection and recreate it if the peer connection drops? So they will handle the re-dialing if required. I can see that the maintenance of the peer connection could become a bit tricky.

Your response to @hansieodendaal answered this question to some extent. The wallet will need to manage these pools as they are required by tasks so it can keep a client open for tasks that need them and once no more tasks need them close them down.

@sdbondi
Copy link
Member Author

sdbondi commented Aug 2, 2021

Just pushed up another test that I wanted in here

hansieodendaal
hansieodendaal previously approved these changes Aug 2, 2021
Copy link
Contributor

@hansieodendaal hansieodendaal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Just some notes for implementing this further. We may want a domain level client that creates an RPC connection pool to have options to specify:

  • if RPC, in general, should take care of reconnecting dropped client connections, furthermore if the client should be exclusive or if it may be selected from a peer list;
  • clearing out/dropping a percentage of PRC sessions if not used for some time (e.g. pool size = 100, never drop sessions to below 10).

stringhandler
stringhandler previously approved these changes Aug 4, 2021
Copy link
Collaborator

@stringhandler stringhandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I would ideally want the wallet to have something a bit different but I think this can be extended into this:

In order of preference I would like:
Option 1. The wallet to have only one connection pool that each protocol/service can use as a dependency, and it asks it for a connection of type T (e.g. BaseNodeRpc) to an address. i.e.
pool.get_client<TRpcClient>(address). If the rpc dance has been done already, return a client, otherwise negotiate
Option 2. The wallet have one connection pool per rpc client type. A protocol/service that needs both basenoderpc and basenodewalletrpc would have one dependency per client type it needs
Option 3. This looks like what you have, although I might be misunderstanding, which is that you have to have a pool per address and rpc type. This is useful, but I think the other two options are going to be more ergonomic

Happy to merge this, and future improvements can be built upon it

@sdbondi
Copy link
Member Author

sdbondi commented Aug 4, 2021

@mikethetike Thanks for the review - looks like we're on the same page. This pool construct forms the basis for what you have outlined.

Option 1) This has the same issues as raised here #3152 (comment) - we'll see that this is something we can do without or build non-generically (types are known) in the domain layer where a pool is needed (currently only the wallet) - PR incoming
Options 2 and 3 are coming in a subsequent domain-level PR (Update: #3159 )

@aviator-app
Copy link
Contributor

aviator-app bot commented Aug 4, 2021

PR queued successfully. Your position in queue is: 3

@stringhandler stringhandler changed the title feat: shared p2p rpc client session pool feat: add a shared p2p rpc client session pool to reduce rpc setup time Aug 4, 2021
@aviator-app
Copy link
Contributor

aviator-app bot commented Aug 4, 2021

PR is on top of the queue now

@aviator-app aviator-app bot added the mq-failed label Aug 4, 2021
@aviator-app
Copy link
Contributor

aviator-app bot commented Aug 4, 2021

PR failed to merge with reason: Some CI status(es) failed
Failed CI(s): ci/circleci: run-integration-tests

@aviator-app aviator-app bot removed the mq-failed label Aug 4, 2021
@aviator-app
Copy link
Contributor

aviator-app bot commented Aug 4, 2021

PR queued successfully. Your position in queue is: 8

@aviator-app
Copy link
Contributor

aviator-app bot commented Aug 4, 2021

PR is on top of the queue now

@aviator-app aviator-app bot added the mq-failed label Aug 4, 2021
@aviator-app
Copy link
Contributor

aviator-app bot commented Aug 4, 2021

PR failed to merge with reason: Some CI status(es) failed
Failed CI(s): ci/circleci: run-integration-tests

@aviator-app aviator-app bot removed the mq-failed label Aug 5, 2021
@aviator-app
Copy link
Contributor

aviator-app bot commented Aug 5, 2021

PR queued successfully. Your position in queue is: 1

@aviator-app aviator-app bot added the mq-failed label Aug 5, 2021
@sdbondi sdbondi force-pushed the comms-rpc-connection-pool branch 3 times, most recently from 2133424 to 6573ff3 Compare August 5, 2021 11:15
@sdbondi sdbondi marked this pull request as draft August 6, 2021 07:26
@sdbondi sdbondi changed the title feat: add a shared p2p rpc client session pool to reduce rpc setup time DNM feat: add a shared p2p rpc client session pool to reduce rpc setup time Aug 6, 2021
@sdbondi sdbondi force-pushed the comms-rpc-connection-pool branch 7 times, most recently from 5b5a0c7 to bae7360 Compare August 6, 2021 11:04
@sdbondi sdbondi changed the title DNM feat: add a shared p2p rpc client session pool to reduce rpc setup time feat: add a shared p2p rpc client session pool to reduce rpc setup time Aug 6, 2021
@sdbondi sdbondi marked this pull request as ready for review August 6, 2021 12:59
@sdbondi sdbondi force-pushed the comms-rpc-connection-pool branch 2 times, most recently from 5ae87d9 to 0f1dc29 Compare August 6, 2021 13:00
@aviator-app aviator-app bot removed the mq-failed label Aug 7, 2021
@aviator-app
Copy link
Contributor

aviator-app bot commented Aug 7, 2021

PR queued successfully. Your position in queue is: 2

@aviator-app
Copy link
Contributor

aviator-app bot commented Aug 7, 2021

PR is on top of the queue now

@aviator-app aviator-app bot merged commit 778f951 into tari-project:development Aug 7, 2021
@sdbondi sdbondi deleted the comms-rpc-connection-pool branch August 8, 2021 06:18
aviator-app bot added a commit that referenced this pull request Aug 10, 2021
<!--- Provide a general summary of your changes in the Title above -->

## Description
<!--- Describe your changes in detail -->
Adds a service responsible for wallet connectivity. This service
is responsible for and abstracts any complexity in the management of
the base node connections and RPC session management.

This PR makes use of this service in the base node montoring service but
does not "plumb" the WalletConenctivityService into the protocols. This
is left as a TODO, but we should expect this to remove many lines of
code and greaty simplify these protocols by removing the budren of
connection management in the various wallet components.

A number of simplifications on the peer connection and substream code,
debatably reducing places that bugs could hide.

## Motivation and Context
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->
Follow on from #3152, making use of the pool in the wallet base node monitoring. The rest of the protocols should follow. 

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->
Extensive service unit tests
Existing tests pass

Washing machine test on local wallet, with the broadcast protocol modified to use the wallet connectivity 
Base node RPC sessions max out at 10 each (2 wallets connected to same base node) even when pushing through 426 transactions.
![image](https://user-images.githubusercontent.com/1057902/128175630-52617736-f71f-4865-a915-6d6c9b00e8d5.png)

Manually tested wallet connectivity states (Connecting -> Online | Connecting -> Offline | Connecting -> Online -> Offline -> Connecting -> Online)

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
* [x] I'm merging against the `development` branch.
* [x] I have squashed my commits into a single commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants