Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer preference #5792

Closed
markg85 opened this issue Nov 23, 2018 · 11 comments
Closed

Peer preference #5792

markg85 opened this issue Nov 23, 2018 · 11 comments

Comments

@markg85
Copy link
Contributor

markg85 commented Nov 23, 2018

Version information:

go-ipfs version: 0.4.17-
Repo version: 7
System version: amd64/linux
Golang version: go1.10.3

Type:

Feature, Enhancement

Description:

I'm playing a bit with IPFS at the moment and "uploaded" (ipfs add -r) a folder that contains a load of 4K images. Then i put IPFS on a vps to see how fast any given image from that folder would load. The answer is obviously excruciatingly slow as only my pc has the images and the instance on the vps becoming the mirror as i click on the images.

But this showed, i think, a flaw. The first image is slow as the peer that has it had to be found first so that's probably to be expected. However, the second image also was very slow to load. I did not saw any bandwidth spikes going from my local pc (where i added the images) to the vps for about 20 seconds or so, then it had a spike and the image was visible on the vps. I tried this with a dozen more images and it's the same effect every time. That leads me to think that for each request IPFS probably looks for peer "from a blank slate", like it knew nothing of the peer it had just used.

Therefore i'd like to request the feature for IPFS to first ask the last successful peer in subsequent requests before it starts asking a load of other peers. This will likely speed up things heavily in the situation where there is only 1 (or a few) peers that have the requested content.

Be aware though that this request does have a small risk that IPFS keeps asking the same peer over and over again if that peer happens to have a lot of cached content. Even if there might be other peers that might respond faster. So i think IPFS should just ask other peers in the background and maintain a list of peers that would have been successful for the last request and randomly pick a different one on each request to not overload one peer. It's a non existing issue if there is only one peer, but it would be a real issue if there are more and you happen to connect to a peer that has the data but is itself is dreadfully slow.

@magik6k
Copy link
Member

magik6k commented Nov 23, 2018

This sounds much like the work that is already happening in bitswap - see #5723

@markg85
Copy link
Contributor Author

markg85 commented Nov 23, 2018

This sounds much like the work that is already happening in bitswap - see #5723

That might be partly, you probably know better though :)
I've been reading up on that a bit (no pun intended) and it seems more tailored towards removing duplicated responses (i.o.w. keeping the bandwidth waste to a minimal).

What i'm asking about is the case where you have very few (or just one) peer that has the data to be smart in how to request subsequent data.

@markg85
Copy link
Contributor Author

markg85 commented Nov 23, 2018

But it does seem to be a duplicate of: ipfs/go-bitswap#14
Funny how recent that message is as well :) (just months ago).

@eingenito
Copy link
Contributor

@markg85 this behavior is pretty strange. I'm pretty sure that before ever going to the DHT to find providers for a block, bitswap just broadcasts the request for the block to all peers that it is connected to. This is like the behavior you describe but even less specific. So if your two IPFS hosts are already connected to each other (via say ipfs swarm connect) there should be no delay in transmitting data.

And even if there is an initial lookup, once the two hosts have transferred at least 1 block they should be connected as peers. Meaning any remaining blocks provided by one of the peers should just be quickly transferred by bitswap without any need for the time consuming process of looking up providers.

You can use the ipfs swarm peers command to list the peers that an IPFS node is currently connected to. Can you verify that your nodes are connected to each other? Is there some reason your nodes would be unable to communicate directly?

@markg85
Copy link
Contributor Author

markg85 commented Nov 30, 2018

@eingenito Hmm, now that i think about it. My local IPFS was working just fine, but the port for external connections (to the local one) might not have been open. Still, it easily had hundreds of peers in the IPFS WEBUI but i'm guessing those were peers that my local instance discovered. So a local to n remote connection setup,

I will have to try this out when i get home.
To be continued.

@markg85
Copy link
Contributor Author

markg85 commented Nov 30, 2018

And to update this.
It was an error on my side. The port (4001) was still closed. Opening it up make the remote much faster for files it didn't have yet. Still, the process of getting a connection when the swarm isn't connected to the node that has the files (and if there is only 1) is still very VERY slow, but i guess that's as expected?

@eingenito
Copy link
Contributor

Yah - it is a known issue with provider lookups in the DHT (and provider writes) and a complicated one. Speeding up this part of IPFS/libp2p is a top priority for both teams and there's a number of people working on the problem.

@markg85
Copy link
Contributor Author

markg85 commented Nov 30, 2018

Yah - it is a known issue with provider lookups in the DHT (and provider writes) and a complicated one. Speeding up this part of IPFS/libp2p is a top priority for both teams and there's a number of people working on the problem.

How are torrent clients doing that then? They are quite fast finding even seeds for torrents that have only a few seeders. That's the DHT at work. I'm not criticizing the the DHT in IPFS, i just wonder where the performance difference comes from. And i'm not even talking about the massive size difference of the DHT torrent network when compared to IPFS.

@eingenito
Copy link
Contributor

Actually the DHT implementation in IPFS is similar to the one in bittorrent. I think the biggest difference is the volume of reads and writes IPFS makes to the DHT. IPFS's design and usage patterns put much greater demands on the DHT on an ongoing basis. So we need to make more efficient use of the DHT (like in #5774), and we need to improve the quality/locality of the data that we put in the DHT.

@markg85
Copy link
Contributor Author

markg85 commented Nov 30, 2018

Ah, oke. It's very interesting that the issues i find that are performance related all apparently have someone working already on it for just a few weeks. Looks like i should just wait till the next release to see some improvements? :)

Anyhow, I'ill close this one as it's really of the topic i started. Thank you very much for the explanation!

@markg85 markg85 closed this as completed Nov 30, 2018
@eingenito
Copy link
Contributor

Yeah - TBH there are a lot of performance improvements that have been in flight for a long while, at least as a thought process, but we're hoping to make significant progress on them over the next few months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants