-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate initial timeouts on DHT / DAS #377
Comments
So after extensive investigations, the issue with DAS content resolution timing out appeared relatively trivial but not straightforward to fix.
Inveestigation
Key Takeaways
SolutionsProperThe most valuable contributor to problem resolution is potentially #378, where we store DAHeader in IPFS and rely on one DataHash instead of multiple roots. Furthermore, we should start synchronously providing DataHash to meet the first takeaway and the DataHash only to satisfy the second one. However, we may also consider keeping asynchronous providing for all the remaining nodes and leaves in the background to contribute to subtrees observability in case. Optimistically, the providing operation should end before the next turn for a node to propose and start providing again. Fortunately, the recently discussed topic of the new DHT update comes into place to help here as well. More info here. It also contributes to the third takeaway by introducing a new DHT node type with full routing tables that can help short-circuit long queries. Although, I need to look more deeply into the implementation to understand all the features and possible tradeoffs before relying on it. Quicker for MVPThe proper solution would take too much time for the MVP. Thus we need to come out with something short-term and desirably not time wasteful:
|
I can now confirm that just with manual sync providing and with all the IPFS/Bitswap async providing disabled, I still get similar and impractical ~3mins to announce 32 roots to the networks |
For the MVP case, we can also rely on rows only. The workaround decreases providing time to a half(~1.5min). I observed that practically. |
More info and explanation regarding the mentioned new DHT client in the proper solution and how it can be helpful to solve this specific case with long-lasting providing. To understand why it is helpful we should understand how it and regular client work. Let's start with an explanation of how kDHT searching works. Anybody who is reading this should imagine a circle of dots in buckets(groups) of k size(circle formation is out of context), where each dot is a network node storing some part of the global network key to value mappings. And when any dot in the circle wants to find/put some value for a key it:
So basic DHT client struggles from the requirement to do these multiple hops for closest dots and those hops are the main reason why it took so much time to provide/put something on the DHT. New DHT client instead of just keeping some portion of key/values periodically crawls and syncs the whole network. This allows having 0 hops and to directly do set or get ops with dots. Comparably to blockchain state syncing, this DHT client also requires some time to instantiate and download the whole network state. Luckily, we already can rely on the practical results with providing time <3sec. Furthermore, the new client comes into place for the case with disappearing and unreliable DHT mediums, as it just remembers what they were providing preserving content discoverability. However, having a copy of the full DHT network/routing table on the node is not cheap, but proposers' interest aligns with fast providing and preserving solid content discoverability, so that's a valid tradeoff. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
setting this to done |
Thanks! Can you update your comment above to include a sentence or two about the result? Otherwise it is hard to see what the outcome of this was. |
For our DHT case, we have Provide and GetProviders operations. On IPFS network Provide operation can take up to 3 mins what was the main cause of the issue, GetProviders can take up to 1 min, but often it takes less than 10 secs. For networks less than 20 nodes, both operations should take less than 10 secs, as bucker size is 20 and no hops are expected. @liamsi, those timings are mostly inevitable and they are applied on any case, so if used with consensus they would present as well. Good for us, we decided to go with push approach. Closing this, further work and info is now here: https://github.com/lazyledger/lazyledger-core/issues/395 |
Summary
We (me, and later @Wondertan confirmed my observation) observed the following behavior: when spinning up a lazyledger validator node on digital ocean and starting a light client locally, DAS for the light client times out.
We currently work around adding the fullnode's IPFS multiaddress to the light client's bootstrap nodes but it is important:
The text was updated successfully, but these errors were encountered: