Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: IPFS Content Providing #31
Proposal: IPFS Content Providing #31
Changes from all commits
c975ca8
7d4bb51
0954c24
65518e9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these projects are also useful for some byproducts they will have (worth counting):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be nice, but I'm shrinking the scope here so we don't necessarily have to tackle these together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember discussing this one time. Would be a huge improvement for most of real-world uses (package manages, wikipedia snapshots)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to add this in too, but it might be out of scope for this project. It's an extra feature which, while valuable, might not be as high value as the other ones here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would presumably also meet a specific ask from Pinata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These project will also likely further decouple content routing (and the complex caching algorithms it utilizes) from specific applications like bitswap and graphsync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thus enabling higher app developer velocity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be true, but isn't necessarily the case in the MVP here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have more data on:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
myApp
) instead of data CIDs. However, this falls apart as the number of application users gets larger. For certain use cases ipfs-cluster could come in handy as well. Pinning services have a few different approaches that are basically 1) build a custom reprovider that tries to be a bit faster (although mostly by throwing more resources + parallelism at the problem and not tweaking the underlying DHT client usage) 2) have really high connection limits so they're connected to tons of people, and permanently connect to major gateways.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have any questions on this, @BigLep feel free to ask :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any new test scenarios that we'd need to develop? For example, as part of CI, should we have a test that asserts X advertisements can be made within Y seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to do in CI especially if those tests are publicly viewable. However, it wouldn't be so bad to just check in on our metrics since they report performance on go-ipfs master + the latest release and it already metrics it already has on provide speed. However, if we want to test some of the massive providing strategies (e.g. huge routing tables + many provides) we'll likely need some more testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I don't know the landscape to have more input. A couple of more thoughts:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR for this libp2p/go-libp2p-kad-dht#709
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having this framed as "do these things" rather than "get to these goals" will make this easier to scope / make it feel more concrete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to just
Make IPFS public DHT
puts take <3 seconds
or more of this section? The "take <3 seconds" part is mostly because we don't have to do all of them if we hit our target with just a few of the optimizations. I listed them in order from what seems easiest to what seems hardest.I can be more precise in this section, although I don't want to overly prescribe how this could be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. the 'puts take <3 seconds' seems like a 'how do we know we're done', rather than a 'plan for work'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news with some lessons learned from libp2p/go-libp2p-kad-dht#709 it turns out that we have a prototype that seems to do the job and already hits under 3s.
The big wins were:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a very small number of failures we were able to reach around 3 puts per second for 1k puts and 20 provides per second for 60k puts. The provides per second should increase the more we do at a time. This is as opposed to 1 provide per 30 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thumbs up for "continuous transparency": seeing the state of providing at all times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n00b question: Do any customers complain about bandwidth today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I've heard of (although @Stebalien might have more info), but providing is pretty heavily limited so the DHT provide bandwidth is unlikely to be a problem today.
The question is around what happens next, i.e. once putting data in the DHT is fast there will still be users who aren't really able to use it.
Some back of the envelope math here is:
A user with 100M provider records where each record is 100 bytes (this is a large overestimate, it's more like 40, but we may want to add some more data to the records) who puts each record to 20 nodes every 24hrs uses
200GiB/day
of upload bandwidth. AWS egress prices are around $0.09/GB, so around $20/month.Again this is an overestimate and might be dwarfed by the egress costs of serving the actual data or other associated costs, but it's not 0.
https://archive.org/ has 538B webpages. If every one of those webpages (the vast majority of which I assume are not normally accessed) was to be individually addressed and advertised in the DHT daily it would be quite expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation and back-of-envelope math; makes sense. Given this info, I'm assuming most (something like 99%?) of customers won't care. I assume huge dataset customers have other special requirements/needs/setup that we'll have other work to make their journey delightful anyways. Given the desire to make IPFS an exceptional tool for developers, the bandwidth increase seems acceptable to take given the benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 we should add this as a new
Reprovider.Strategy
(thinking..pinned+files-roots
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that would be nice. Maybe only announce a file if the node has the whole file in cache?
Maybe worth a discussion if that should be the default for example for browser integrations (like brave) and ipfs-desktop. If someone just wants to share some files, I don't see a reason to announce all chunks. Hunting for nodes which have just some single blocks of a file because of deduplication is probably not worth the effort of connecting to them.