Add Blogpost: DHT Refactoring work #619

yiannisbot · 2023-09-21T19:08:03Z

Blogpost to raise awareness and inform the community around the work that ProbeLab is doing on the DHT Refactoring. The post also introduces the new name for the Public IPFS DHT -> Amino, as well as a new feature that is under development and which improves the speed of the Provide operation by several orders of magnitude.

github-actions · 2023-09-21T19:13:59Z

Images automagically compressed by Calibre's image-actions ✨

Compression reduced images by 43%, saving 327.83 KB.

Filename	Before	After	Improvement	Visual comparison
`src/assets/2023-09-amino-refactoring.png`	761.87 KB	434.04 KB	-43.0%	View diff

561 images did not require optimisation.

yiannisbot · 2023-09-22T05:59:15Z

@damedoteth @2color I've created this PR for a blogpost we want to publish regarding the refactoring work on the DHT that the ProbeLab team is doing. Please have a look and let me know if it all looks good. We need to adjust the date depending on when this goes out.

cc: @BigLep

damedoteth · 2023-09-22T17:09:22Z

src/_blog/2023-09-amino-refactoring.md

@@ -0,0 +1,110 @@
+---
+title: Amino (the Public IPFS DHT) is getting a facelift and a lightning fast Reprovide strategy


Could we shorten this title somehow? It currently truncates on the blog feed. Maybe: "The public IPFS DHT is getting a facelift"

Currently this is where the truncation occurs: "[Amino (the Public IPFS DHT) is getting a facelift and a lightning fast..."

Or maybe "The Public IPFS DHT is getting a facelift, name, & big improvements"

I've changed to this title: "Amino (the Public IPFS DHT) is getting a facelift". I think it should be fine to not mention the reprovide part (and indirectly assume it's part of the "facelift").

damedoteth · 2023-09-22T17:38:54Z

@yiannisbot I've made some small syntax and grammar adjustments throughout the post, but otherwise looks great to me! Could you take a look at my comment regarding the title of the post and let me know your thoughts?

yiannisbot · 2023-09-26T06:30:20Z

@yiannisbot I've made some small syntax and grammar adjustments throughout the post, but otherwise looks great to me! Could you take a look at my comment regarding the title of the post and let me know your thoughts?

Excellent! Thanks @damedoteth - I've responded to the comment and changed the title too. If all looks good, feel free to merge.

2color

This is a very insightful blog post!

I left some comments, but none are blocking.

src/_blog/2023-09-amino-refactoring.md

2color · 2023-09-26T14:46:44Z

src/_blog/2023-09-amino-refactoring.md

+
+The `go-libp2p-kad-dht` DHT implementation must keep track of the CIDs that must be republished every `Interval` (let’s assume that all Provider Records are republished at the same frequency). The Kademlia identifiers of the CIDs to republish must be arranged in a [binary trie](https://github.com/guillaumemichel/py-binary-trie) to allow for faster access. As each Provider Record is replicated on 20 different DHT Servers, 20 DHT Servers in a close locality are expected to store the same Provider Records (this is not 100% accurate, but suffices for our high-level description here - we’ll publish all the details in a subsequent post, when the solution is in production).
+
+In a nutshell, the Content Provider will continuously lookup keys across the entire keyspace, hence “sweeping” the keyspace. For each key that is to be published, the Content Provider will find the 20 closest peers, and lookup in its “CIDs Republish Binary Trie” all Provider Records that would belong to those specific 20 remote peers. Doing this match-making exercise, content providers will be able to reprovide all provider records that correspond to a particular peer at once. Based on this logic, Content Providers are only limited by network throughput.


In a nutshell, the Content Provider will continuously lookup keys across the entire keyspace, hence “sweeping” the keyspace.

This first sentence needs to be clarified. Which keys is the Content Provider looking up in this instance? Are these just keys across the key space, or are these actual keys that will be "batch published" by then looking up the binary trie?

Also, if you already arranged the CIDs (which double as the keys) in a binary trie, do you use the binary trie to do this sweeping (with the goal of finding close peers)? Or do you maintain two representations of the CIDs to be published?

Finally, is there a reason why Content Providers is sometimes capitalised and sometimes ins't?

@2color apologies I didn't get to this in time. I didn't mean to ignore, but left for a trip and didn't manage to address your comments.

I would address the first and third comment. The second is getting a little too much into the details TBH, which will go into a more detailed blogpost about Reprovide Sweep when we get to it.

If you feel we should definitely address the first and third, let me know and I'll do a separate PR.

Thanks and no problem. I think it would be great to clarify the first and third points even if the post is already published.

src/_blog/2023-09-amino-refactoring.md

Co-authored-by: Daniel Norman <[email protected]>

yiannisbot added 2 commits September 21, 2023 22:06

DHT Refactoring work

e316d7d

header image

593379f

yiannisbot changed the title ~~DHT Refactoring work~~ Add Blogpost: DHT Refactoring work Sep 21, 2023

Optimised images with calibre/image-actions

0f3c478

yiannisbot added 2 commits September 22, 2023 08:34

fixes broken link

467e3d1

fixes table layout

bb1fe98

yiannisbot requested review from 2color and damedoteth September 22, 2023 05:57

yiannisbot added 2 commits September 22, 2023 08:59

date update

d7ad16c

fixes $leq$ syntax

a3eea80

damedoteth reviewed Sep 22, 2023

View reviewed changes

damedoteth added 3 commits September 22, 2023 13:10

Update 2023-09-amino-refactoring.md

17dbf40

Update 2023-09-amino-refactoring.md

8510202

Update 2023-09-amino-refactoring.md

c68a88f

damedoteth and others added 3 commits September 25, 2023 11:45

Update 2023-09-amino-refactoring.md

9d1f828

Merge branch 'main' into dht-amino-refactoring

1dec1d9

changes title and date

3e7d4d5

2color approved these changes Sep 26, 2023

View reviewed changes

yiannisbot and others added 5 commits September 26, 2023 19:20

Update src/_blog/2023-09-amino-refactoring.md

eb7ec42

Co-authored-by: Daniel Norman <[email protected]>

Update src/_blog/2023-09-amino-refactoring.md

e25889d

Co-authored-by: Daniel Norman <[email protected]>

Update src/_blog/2023-09-amino-refactoring.md

2ed5554

Co-authored-by: Daniel Norman <[email protected]>

adding office hours luma link

215327b

Update 2023-09-amino-refactoring.md

dce37a0

damedoteth merged commit 131866d into main Sep 27, 2023
2 checks passed

damedoteth deleted the dht-amino-refactoring branch September 27, 2023 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Blogpost: DHT Refactoring work #619

Add Blogpost: DHT Refactoring work #619

yiannisbot commented Sep 21, 2023

github-actions bot commented Sep 21, 2023

yiannisbot commented Sep 22, 2023

damedoteth Sep 22, 2023

damedoteth Sep 22, 2023

yiannisbot Sep 26, 2023

damedoteth commented Sep 22, 2023

yiannisbot commented Sep 26, 2023

2color left a comment

2color Sep 26, 2023

yiannisbot Oct 2, 2023

2color Oct 5, 2023

		@@ -0,0 +1,110 @@
		---
		title: Amino (the Public IPFS DHT) is getting a facelift and a lightning fast Reprovide strategy


		The `go-libp2p-kad-dht` DHT implementation must keep track of the CIDs that must be republished every `Interval` (let’s assume that all Provider Records are republished at the same frequency). The Kademlia identifiers of the CIDs to republish must be arranged in a [binary trie](https://github.com/guillaumemichel/py-binary-trie) to allow for faster access. As each Provider Record is replicated on 20 different DHT Servers, 20 DHT Servers in a close locality are expected to store the same Provider Records (this is not 100% accurate, but suffices for our high-level description here - we’ll publish all the details in a subsequent post, when the solution is in production).

		In a nutshell, the Content Provider will continuously lookup keys across the entire keyspace, hence “sweeping” the keyspace. For each key that is to be published, the Content Provider will find the 20 closest peers, and lookup in its “CIDs Republish Binary Trie” all Provider Records that would belong to those specific 20 remote peers. Doing this match-making exercise, content providers will be able to reprovide all provider records that correspond to a particular peer at once. Based on this logic, Content Providers are only limited by network throughput.

Add Blogpost: DHT Refactoring work #619

Add Blogpost: DHT Refactoring work #619

Conversation

yiannisbot commented Sep 21, 2023

github-actions bot commented Sep 21, 2023

yiannisbot commented Sep 22, 2023

damedoteth Sep 22, 2023

Choose a reason for hiding this comment

damedoteth Sep 22, 2023

Choose a reason for hiding this comment

yiannisbot Sep 26, 2023

Choose a reason for hiding this comment

damedoteth commented Sep 22, 2023

yiannisbot commented Sep 26, 2023

2color left a comment

Choose a reason for hiding this comment

2color Sep 26, 2023

Choose a reason for hiding this comment

yiannisbot Oct 2, 2023

Choose a reason for hiding this comment

2color Oct 5, 2023

Choose a reason for hiding this comment