-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
indy-vdr is unable to connect to a pool when some genesis nodes do not respond #106
Comments
@andrewwhitehead, @swcurran, I wish I had more information for you to go on, but I've been unable to recreate the issue except in this particular scenario I'm facing now. |
Wade -- could you get around your current problem by creating a new Genesis file for your own use that doesn't have the no longer active node? |
That was the next thing I was going to try. |
Updating the pool genesis file to include additional transactions acts as a workaround for the pool connection issue for this Scenario; sovrin-foundation/sovrin@stable...WadeBarnes:sovrin:test-pool-update |
In fact, adding the transactions for just one more active node to the pool genesis file works; sovrin-foundation/sovrin@stable...WadeBarnes:sovrin:test-pool-update-2 |
To be clear So the issue is, there are 6 active genesis nodes on the network, and from my network only 1 of the 6 is not responding (5 of the 6 available and responding), yet I am unable to connect to the pool using indy-vdr unless at least 1 additional active node is added to the genesis file. |
One would expect to be able to successfully connect to the pool when 5 out of 6 active genesis nodes are available. |
Hi @WadeBarnes, So from what I can see there are 16 verifiers in the initial pool transactions, after filtering the ones without the VALIDATOR service. This gives an I don't get any response from the following nodes: "cynjanode", "EBPI-validation-node", "lab10", "SovrinNode", "Swisscom", "NodeTwinPeek", "dativa_validator", "VALIDATOR1", "trusted_you". I do get responses from: "anonyome", "regioit01", "NECValidator", "australia", "DigiCert-Node", "sovrin.sicpa.com", "Absa". You would get a consensus error if two of these are unreachable or return a different result. Do we really need that many matching responses in order to proceed with the catch-up? I'm not sure, it might be worth investigation, especially since the subsequent transactions are signed. I think you would need to wait for the timeout to expire on the unreachable nodes, though. The AllWeightsZero error is a bug, I can add a PR for that soon. |
@andrewwhitehead, I can confirm there are 16 validator nodes in the pool genesis file. It has not been updated since that version was created. There are currently 12 validator nodes on that network:
With 6 of those nodes being in the pool genesis file:
I had missed You should not be getting a response from If I understand correctly, you are saying that indy-vdr requires consensus to perform the catchup on the pool transactions in order to determine the current state of the network. Is that correct? If so, is that necessary? It would seem that the scenario I encountered was, I was only able to connect to 5 of what indy-vdr thought was 7 active validators. The initial pool connection is obviously the most critical step in determining the current state of the network. Would it be possible to do that without requiring full consensus, and then validating the state of the pool transactions once they are fully loaded? |
|
It's mainly the initial status request which is the bottleneck, as that currently requires consensus. That behaviour is inherited from Indy-SDK but I could see it needing updates to make the network more reachable. It would likely either need to wait for a timeout (failure) on the status request and have special handling to follow up on any/all of the responses, or possibly interleave the status requests and catch up requests. |
What's the effort of such a change? Also it would be nice to avoid any timeouts in the first place. Using indy-vdr with indy-node-monitor we've noticed pool connection timeouts are rather common place and pool caching is critical to performance. However the pool timeouts on initial connection can really hinder startup performance. |
I think the hard part is ensuring that it's resilient and secure, that having one of the original node IPs taken over won't lead to failed connections or worse. It might be better to focus on an abbreviated genesis transaction format that doesn't have to list every transaction in order to provide the list of active nodes. |
In some cases indy-vdr will timeout connecting to a pool when one or more of the pool's genesis nodes does not respond.
Scenario:
0.1.0
and0.3.4
In the above case indy-vdr is unable to connect to the pool and continually times out.
A pool connection can be established and cached by the API by connecting to a different network via VPN and querying the nodes. Once the connection is cached and the VPN disconnected (returning to the blocked IP) additional queries can be made that indicate a node (DigiCert-Node in this case) is not responding. If the pool cache is cleared (the API restarted) indy-vdr is once again unable to connect to the pool.
I have tried to reproduce this issue with von-network with no success.
I have also tried excluding DigiCert-Node from the pool by using the
node_weights
like this:However that always results in the following error whenever any node weight is set to zero:
This issue was reported and discussed at the 2022-10-25 Indy Contributors call
The text was updated successfully, but these errors were encountered: