-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitswap Improvement Plan #5723
Comments
Reasons we think @whyrusleeping 's PR is not working with Wikipedia:
Stats gathered on a run of LS (incomplete, but a long way in): |
@hannahhoward one point, the peers we are selecting from here should not ever be dead peers. |
It may be worth the extra effort to select peers based on how many blocks they have sent us so far, instead of just random selection. The random selection stuff works pretty well when there are a small number of peers in the potential set, but 2000 is absurd, and we need to be smarter (we should never broadcast to 2000 peers...) |
Also, over 2000 peers feels sketchy. Do we have even 2000 connections? Maybe something is up there... |
yea I now think the problem is not dead peers but slow peers vs fast peers. I'm going to post a PR to bitswap with a test that replicates the issue 2279 is the length of session.activePeersArray.... I dunno if maybe we're not checking for uniqueness or something? seems unlikely though. |
Yeah, 2279 feels very wrong. We shouldnt even have that many connections at all in general, let alone connections to peers that have the data we're interested in. |
Supersedes #2111 |
Just want to say this issue is still open, and the things that are not checked off need to be done. |
Would a possible workaround be to limit the Wants which contain each block request to a single node per block such that, initially, each node receives a Want containing a unique block which is not sent to other nodes unless in a given timeout period expires. Then when a node fails to provide the block requested the block is added to the Want list provided to a different node, with the previously failed node being blacklisted for the duration of the operation, until all the nodes in the DHT have been tried. This would significantly reduce the overhead of requesting that every node containing a block should send it, ensuring that all but the first download to finish would be wasted duplication. And would in effect result in striping block requests across the available nodes. This has the obvious downside of meaning that if for any reason you have a large number of nodes advertising a block which they refuse, or fail, to provide; that the client could end up waiting a multiple of the timeout period multiplied by the number of bad nodes until it reaches a node which can/will satisfy the request. This could be mitigated by introducing a sliding window, where after a configurable number of failed requests the client includes the block in multiple Wants, to a number that increases over time. For example the client could include the block to only one unique node for 3 attempts, then to 2 nodes for 3 attempts, then to 4 nodes, etc until the client has either excluded all the nodes as being unable to provide the block or is including the block in a Want sent to every node. |
That's almost exactly what we now do. |
Closing as this sprint has finished. |
Goals
This is a meta issue to track and discuss improving the Bitswap implementation
. Currently, what we have is:
Our goal is to measure the current behavior of the various implementations and to improve them. All of them suffer from duplicate blocks to differing degrees.
Out of scope:
Tasks
Things we want to do to improve performance:
(https://github.com/ipfs/go-ipfs/blob/feat/bitswap-benchmark/test/integration/bitswap_session_test.go)
The text was updated successfully, but these errors were encountered: