Bitswap Improvement Plan #5723

hannahhoward · 2018-11-02T22:06:07Z

Goals

This is a meta issue to track and discuss improving the Bitswap implementation

. Currently, what we have is:

Bitswap w/o Sessions (https://github.com/ipfs/go-bitswap/blob/master/bitswap.go)
Bitswap w/ existing Sessions (https://github.com/ipfs/go-bitswap/blob/master/session.go)
A start at Bitswap w/ "smarter Sessions" ([WIP] Work towards improving the duplicate blocks issue go-bitswap#8) that needs more testing/tweaking for some scenarios.

Our goal is to measure the current behavior of the various implementations and to improve them. All of them suffer from duplicate blocks to differing degrees.

Out of scope:

Any intelligence about what we are fetching other than the existence of a Session (i.e. DAG smarts and/or GraphSync)

Tasks

Things we want to do to improve performance:

Merge fixes to sessions and additional tests from Jeromy's PR referenced above
Diagnose why Jeromy's PR is sometimes not working in real world scenarios
- We have written some benchmarks that set up a more real world like scenario
  (https://github.com/ipfs/go-ipfs/blob/feat/bitswap-benchmark/test/integration/bitswap_session_test.go)
- Using logging we've identified likely causes for the issue (see comment below)
Write benchmarks that effectively reproduce the issues that prevent Jeromy's PR from working in real world scenarios
Bitswap Refactor For Clarity
Experement with various alternate approaches to smart sessions (see comments)
Measure real'ish world transfer speeds by driving full IPFS instances (possibly interop)
Improve the experimental version of Bitswap sessions
- Initial reordering of peers by last block received
- More complex refactors to optimize for many scenarious

hannahhoward · 2018-11-02T22:10:54Z

Reasons we think @whyrusleeping 's PR is not working with Wikipedia:

First, dupl does not increase a lot. However, it increases near the beginning to either 2 or 3 and it's not clear if that is a good idea
Wikipedia has up to around 2000 peers. This presents significant challenges to the current implementation which picks peers at random
Because peers are picked at random, even with a dupl factor of 3, there are common scenarios in which the picked peers are actually dead peers who don't respond, causing a broadcast
Multiple broadcasts of wants to 2000 peers produces a ton of duplicates.

Stats gathered on a run of LS (incomplete, but a long way in):
2279 Peers, 44440 Fetches, 5560 Broadcasts -- the broadcast ratio to fetch ration is almost 10:1 but 5560 x 2000 == 12.2 million

whyrusleeping · 2018-11-04T21:31:07Z

@hannahhoward one point, the peers we are selecting from here should not ever be dead peers.

whyrusleeping · 2018-11-04T21:33:23Z

It may be worth the extra effort to select peers based on how many blocks they have sent us so far, instead of just random selection. The random selection stuff works pretty well when there are a small number of peers in the potential set, but 2000 is absurd, and we need to be smarter (we should never broadcast to 2000 peers...)

whyrusleeping · 2018-11-04T21:33:58Z

Also, over 2000 peers feels sketchy. Do we have even 2000 connections? Maybe something is up there...

hannahhoward · 2018-11-09T01:12:44Z

@whyrusleeping

yea I now think the problem is not dead peers but slow peers vs fast peers.

I'm going to post a PR to bitswap with a test that replicates the issue

2279 is the length of session.activePeersArray.... I dunno if maybe we're not checking for uniqueness or something? seems unlikely though.

whyrusleeping · 2018-11-19T00:39:43Z

Yeah, 2279 feels very wrong. We shouldnt even have that many connections at all in general, let alone connections to peers that have the data we're interested in.

eingenito · 2019-05-14T18:31:18Z

Supersedes #2111

hannahhoward · 2019-05-15T21:32:03Z

Just want to say this issue is still open, and the things that are not checked off need to be done.

ghost · 2019-05-31T21:00:37Z

Would a possible workaround be to limit the Wants which contain each block request to a single node per block such that, initially, each node receives a Want containing a unique block which is not sent to other nodes unless in a given timeout period expires. Then when a node fails to provide the block requested the block is added to the Want list provided to a different node, with the previously failed node being blacklisted for the duration of the operation, until all the nodes in the DHT have been tried.

This would significantly reduce the overhead of requesting that every node containing a block should send it, ensuring that all but the first download to finish would be wasted duplication. And would in effect result in striping block requests across the available nodes.

This has the obvious downside of meaning that if for any reason you have a large number of nodes advertising a block which they refuse, or fail, to provide; that the client could end up waiting a multiple of the timeout period multiplied by the number of bad nodes until it reaches a node which can/will satisfy the request. This could be mitigated by introducing a sliding window, where after a configurable number of failed requests the client includes the block in multiple Wants, to a number that increases over time. For example the client could include the block to only one unique node for 3 attempts, then to 2 nodes for 3 attempts, then to 4 nodes, etc until the client has either excluded all the nodes as being unable to provide the block or is including the block in a Want sent to every node.

Stebalien · 2019-06-14T10:18:11Z

That's almost exactly what we now do.

Stebalien · 2019-08-21T18:29:44Z

Closing as this sprint has finished.

djdv added topic/bitswap Topic bitswap topic/meta Topic meta labels Nov 2, 2018

magik6k added the topic/perf Performance label Nov 4, 2018

This was referenced Nov 16, 2018

Bitswap Refactor #4: Extract session peer manager from sessions ipfs/go-bitswap#26

Merged

feat(Benchmarks): Add real world dup blocks test ipfs/go-bitswap#25

Merged

magik6k mentioned this issue Nov 23, 2018

Peer preference #5792

Closed

eingenito mentioned this issue Dec 5, 2018

bitswap distributed block requests #980

Closed

eingenito mentioned this issue May 14, 2019

Improve transfer speed #2111

Closed

hannahhoward mentioned this issue May 15, 2019

Bitswap sessions should rank peers #4396

Closed

DonaldTsang mentioned this issue May 16, 2019

Performace, or How IPFS will be better than BitTorrent #6342

Closed

Stebalien closed this as completed Aug 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bitswap Improvement Plan #5723

Bitswap Improvement Plan #5723

hannahhoward commented Nov 2, 2018 •

edited

Loading

hannahhoward commented Nov 2, 2018

whyrusleeping commented Nov 4, 2018

whyrusleeping commented Nov 4, 2018

whyrusleeping commented Nov 4, 2018

hannahhoward commented Nov 9, 2018

whyrusleeping commented Nov 19, 2018

eingenito commented May 14, 2019

hannahhoward commented May 15, 2019

ghost commented May 31, 2019 •

edited by ghost

Loading

Stebalien commented Jun 14, 2019

Stebalien commented Aug 21, 2019

Bitswap Improvement Plan #5723

Bitswap Improvement Plan #5723

Comments

hannahhoward commented Nov 2, 2018 • edited Loading

Goals

Tasks

hannahhoward commented Nov 2, 2018

whyrusleeping commented Nov 4, 2018

whyrusleeping commented Nov 4, 2018

whyrusleeping commented Nov 4, 2018

hannahhoward commented Nov 9, 2018

whyrusleeping commented Nov 19, 2018

eingenito commented May 14, 2019

hannahhoward commented May 15, 2019

ghost commented May 31, 2019 • edited by ghost Loading

Stebalien commented Jun 14, 2019

Stebalien commented Aug 21, 2019

hannahhoward commented Nov 2, 2018 •

edited

Loading

ghost commented May 31, 2019 •

edited by ghost

Loading