Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Blogpost: DHT Refactoring work #619

Merged
merged 18 commits into from
Sep 27, 2023
Merged

Add Blogpost: DHT Refactoring work #619

merged 18 commits into from
Sep 27, 2023

Conversation

yiannisbot
Copy link
Member

Blogpost to raise awareness and inform the community around the work that ProbeLab is doing on the DHT Refactoring. The post also introduces the new name for the Public IPFS DHT -> Amino, as well as a new feature that is under development and which improves the speed of the Provide operation by several orders of magnitude.

@yiannisbot yiannisbot changed the title DHT Refactoring work Add Blogpost: DHT Refactoring work Sep 21, 2023
@github-actions
Copy link
Contributor

Images automagically compressed by Calibre's image-actions

Compression reduced images by 43%, saving 327.83 KB.

Filename Before After Improvement Visual comparison
src/assets/2023-09-amino-refactoring.png 761.87 KB 434.04 KB -43.0% View diff

561 images did not require optimisation.

@yiannisbot
Copy link
Member Author

@damedoteth @2color I've created this PR for a blogpost we want to publish regarding the refactoring work on the DHT that the ProbeLab team is doing. Please have a look and let me know if it all looks good. We need to adjust the date depending on when this goes out.

cc: @BigLep

@@ -0,0 +1,110 @@
---
title: Amino (the Public IPFS DHT) is getting a facelift and a lightning fast Reprovide strategy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we shorten this title somehow? It currently truncates on the blog feed. Maybe: "The public IPFS DHT is getting a facelift"

Currently this is where the truncation occurs: "[Amino (the Public IPFS DHT) is getting a facelift and a lightning fast..."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe "The Public IPFS DHT is getting a facelift, name, & big improvements"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed to this title: "Amino (the Public IPFS DHT) is getting a facelift". I think it should be fine to not mention the reprovide part (and indirectly assume it's part of the "facelift").

@damedoteth
Copy link
Collaborator

@yiannisbot I've made some small syntax and grammar adjustments throughout the post, but otherwise looks great to me! Could you take a look at my comment regarding the title of the post and let me know your thoughts?

@yiannisbot
Copy link
Member Author

@yiannisbot I've made some small syntax and grammar adjustments throughout the post, but otherwise looks great to me! Could you take a look at my comment regarding the title of the post and let me know your thoughts?

Excellent! Thanks @damedoteth - I've responded to the comment and changed the title too. If all looks good, feel free to merge.

Copy link
Member

@2color 2color left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very insightful blog post!

I left some comments, but none are blocking.

src/_blog/2023-09-amino-refactoring.md Outdated Show resolved Hide resolved

The `go-libp2p-kad-dht` DHT implementation must keep track of the CIDs that must be republished every `Interval` (let’s assume that all Provider Records are republished at the same frequency). The Kademlia identifiers of the CIDs to republish must be arranged in a [binary trie](https://github.com/guillaumemichel/py-binary-trie) to allow for faster access. As each Provider Record is replicated on 20 different DHT Servers, 20 DHT Servers in a close locality are expected to store the same Provider Records (this is not 100% accurate, but suffices for our high-level description here - we’ll publish all the details in a subsequent post, when the solution is in production).

In a nutshell, the Content Provider will continuously lookup keys across the entire keyspace, hence “sweeping” the keyspace. For each key that is to be published, the Content Provider will find the 20 closest peers, and lookup in its “CIDs Republish Binary Trie” all Provider Records that would belong to those specific 20 remote peers. Doing this match-making exercise, content providers will be able to reprovide all provider records that correspond to a particular peer at once. Based on this logic, Content Providers are only limited by network throughput.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a nutshell, the Content Provider will continuously lookup keys across the entire keyspace, hence “sweeping” the keyspace.

This first sentence needs to be clarified. Which keys is the Content Provider looking up in this instance? Are these just keys across the key space, or are these actual keys that will be "batch published" by then looking up the binary trie?

Also, if you already arranged the CIDs (which double as the keys) in a binary trie, do you use the binary trie to do this sweeping (with the goal of finding close peers)? Or do you maintain two representations of the CIDs to be published?

Finally, is there a reason why Content Providers is sometimes capitalised and sometimes ins't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@2color apologies I didn't get to this in time. I didn't mean to ignore, but left for a trip and didn't manage to address your comments.

I would address the first and third comment. The second is getting a little too much into the details TBH, which will go into a more detailed blogpost about Reprovide Sweep when we get to it.

If you feel we should definitely address the first and third, let me know and I'll do a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and no problem. I think it would be great to clarify the first and third points even if the post is already published.

src/_blog/2023-09-amino-refactoring.md Outdated Show resolved Hide resolved
src/_blog/2023-09-amino-refactoring.md Show resolved Hide resolved
src/_blog/2023-09-amino-refactoring.md Outdated Show resolved Hide resolved
@damedoteth damedoteth merged commit 131866d into main Sep 27, 2023
2 checks passed
@damedoteth damedoteth deleted the dht-amino-refactoring branch September 27, 2023 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants