Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: altruistic sharding to ensure shard coverage #1892

Open
alrevuelta opened this issue Aug 8, 2023 · 2 comments
Open

feat: altruistic sharding to ensure shard coverage #1892

alrevuelta opened this issue Aug 8, 2023 · 2 comments

Comments

@alrevuelta
Copy link
Contributor

Problem

With autosharding we need to ensure that every shard in the network has a similar number of nodes relaying traffic in it. For example, if there are 1000 nodes in the network and 10 shards, ideally each shard will have 100 nodes (assuming each node connects to 1 shard).

If we assume that content topics follow an uniformal distribution, then we can consider this problem as solved. But content topics can easily be biased to land in a specific shard. So we need an extra protection, "altruistic sharding TM" (open for names).

Suggested solution

Make each node to be subscribed to 1 (or more?) extra shards, on top of the ones that the node is interested in (because of the content topic). This extra shard can be rotated every x time and picked randomly?.

Unfinished, algorithim and reserch to be defined

Alternatives considered

.

Additional context

.

Acceptance criteria

.

@SionoiS
Copy link
Contributor

SionoiS commented Aug 8, 2023

Picking a shard at random would be the easiest and a good first step but maybe we could count the # of nodes for each shard we get from Discv5 and use the lowest?

I'm not sure it would be better since the nodes we get from Discv5 are kinda random too.

@alrevuelta
Copy link
Contributor Author

but maybe we could count the # of nodes for each shard we get from Discv5 and use the lowest?

I see two problems with this:

  • this can be easily faked. which is an attack vector. imagine someone spining up multiple nodes (with just discv5) on shard 1 but without actually running on that shard. then no nodes will run in that shard, and that shard "wont work"
  • requires some general knowledge of the network, which takes time to acquire. lets say i have found 1000 nodes. can i take some decission based on that? well maybe the network has 10000000 nodes and you are taking a decission to early.

imho the best approach is some statistical one that ensures coverage without i) relying on the ENR and ii) without having to crawl the whole network.

@vpavlin vpavlin moved this to Icebox in Waku Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Icebox
Development

No branches or pull requests

2 participants