-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Blogpost: DHT Refactoring work #619
Conversation
Images automagically compressed by Calibre's image-actions ✨ Compression reduced images by 43%, saving 327.83 KB.
561 images did not require optimisation. |
@damedoteth @2color I've created this PR for a blogpost we want to publish regarding the refactoring work on the DHT that the ProbeLab team is doing. Please have a look and let me know if it all looks good. We need to adjust the date depending on when this goes out. cc: @BigLep |
@@ -0,0 +1,110 @@ | |||
--- | |||
title: Amino (the Public IPFS DHT) is getting a facelift and a lightning fast Reprovide strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we shorten this title somehow? It currently truncates on the blog feed. Maybe: "The public IPFS DHT is getting a facelift"
Currently this is where the truncation occurs: "[Amino (the Public IPFS DHT) is getting a facelift and a lightning fast..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe "The Public IPFS DHT is getting a facelift, name, & big improvements"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed to this title: "Amino (the Public IPFS DHT) is getting a facelift". I think it should be fine to not mention the reprovide part (and indirectly assume it's part of the "facelift").
@yiannisbot I've made some small syntax and grammar adjustments throughout the post, but otherwise looks great to me! Could you take a look at my comment regarding the title of the post and let me know your thoughts? |
Excellent! Thanks @damedoteth - I've responded to the comment and changed the title too. If all looks good, feel free to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very insightful blog post!
I left some comments, but none are blocking.
|
||
The `go-libp2p-kad-dht` DHT implementation must keep track of the CIDs that must be republished every `Interval` (let’s assume that all Provider Records are republished at the same frequency). The Kademlia identifiers of the CIDs to republish must be arranged in a [binary trie](https://github.com/guillaumemichel/py-binary-trie) to allow for faster access. As each Provider Record is replicated on 20 different DHT Servers, 20 DHT Servers in a close locality are expected to store the same Provider Records (this is not 100% accurate, but suffices for our high-level description here - we’ll publish all the details in a subsequent post, when the solution is in production). | ||
|
||
In a nutshell, the Content Provider will continuously lookup keys across the entire keyspace, hence “sweeping” the keyspace. For each key that is to be published, the Content Provider will find the 20 closest peers, and lookup in its “CIDs Republish Binary Trie” all Provider Records that would belong to those specific 20 remote peers. Doing this match-making exercise, content providers will be able to reprovide all provider records that correspond to a particular peer at once. Based on this logic, Content Providers are only limited by network throughput. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a nutshell, the Content Provider will continuously lookup keys across the entire keyspace, hence “sweeping” the keyspace.
This first sentence needs to be clarified. Which keys is the Content Provider looking up in this instance? Are these just keys across the key space, or are these actual keys that will be "batch published" by then looking up the binary trie?
Also, if you already arranged the CIDs (which double as the keys) in a binary trie, do you use the binary trie to do this sweeping (with the goal of finding close peers)? Or do you maintain two representations of the CIDs to be published?
Finally, is there a reason why Content Providers is sometimes capitalised and sometimes ins't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2color apologies I didn't get to this in time. I didn't mean to ignore, but left for a trip and didn't manage to address your comments.
I would address the first and third comment. The second is getting a little too much into the details TBH, which will go into a more detailed blogpost about Reprovide Sweep when we get to it.
If you feel we should definitely address the first and third, let me know and I'll do a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and no problem. I think it would be great to clarify the first and third points even if the post is already published.
Co-authored-by: Daniel Norman <[email protected]>
Co-authored-by: Daniel Norman <[email protected]>
Co-authored-by: Daniel Norman <[email protected]>
Blogpost to raise awareness and inform the community around the work that ProbeLab is doing on the DHT Refactoring. The post also introduces the new name for the Public IPFS DHT -> Amino, as well as a new feature that is under development and which improves the speed of the Provide operation by several orders of magnitude.