-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale Peer-Pad #180
Comments
@diasdavid could you review this ^^? |
@diasdavid left that Easter egg for you :) — I couldn't find it anywhere in the literature, so you have prior art! ;) |
@pgte With 6 fingers, the PubSub message will reach be capable of reaching 216 in three hops. However, there are some considerations to have here:
|
Yes, I have that concern too. I think we could easily the changes and them and use a gossip frequency heuristic that takes into account the number of peers and the urgency (amount of changes since last broadcast) and also random enough to avoid spikes.. |
This has been in place for a while now, and it's reflected in the specs docs in peer-star-app. |
Scale Peer-Pad
This issue tracks the issues and requirements of this scaling effort.
Goal
In the short term, we should be able to have 150 concurrent users, editing the same pad.
Here is an outline of the plan for your comment and context:
Connection count throttling + app hashring building
Instead of announcing every peer that is discovered to the IPFS layer, we're going to do some filering on the discovery.
We're going to wrap the transport's object. This wrapper will listen to
peer:discovery
events and build a consistent hashring from the peer IDs that are part of the application. But how do we find whether the peer is part of the application? For that, we need to know whether it's interested in a specific<application>
pub-sub topic.When the discovery at the transport level finds a peer:
When a peer from this app hashring disconnects, remove it from the hash ring.
Every time this app hashring changes:
peer:discovery
eventThis makes the peer only keep connected to the set of target peers (defined below) while the hashring changes.
Computing the set of target peers
For a given hashring of peers, the set of target peers is composed by the union of:
These target peers will change as the hash ring constituency changes, as they are relative positions in the hash ring.
This set of target peers is called the Dias-Peer-Set.
The collaboration membership gossip
This should give us a app-wide scalable pub-sub primitive when using floodsub.
Now, when pariticipating in a collaboration, a peer needs to know which peers are part of the collaboration.
Each collaboration has a unique identifier. By using the app-wide pub-sub primitive, they can register interest in the
<application>/<collaboration>/membership
topic. They can use this topic to gossip and disseminate the membership of a collaboration.A peer collects the members in the collaboration membership gossip, accumulates it and forwards it to other peers, thus making every node find out about each other.
Each collaboration node keeps a set of app peer ids.
Using the
<application>/<collaboration>/membership
topic:On a computed interval, we broadcast the set of known peers by using a delta-based Observed-Remove-Set CRDT (OR-Set).
When we receive a membership message on this channel, we incorporate it as a delta on the set of known peers in this collaboration.
All messages in this channel are encrypted and decryped with a custom application funcion.
Adaptive gossip dissemination frequency heuristic
In order to keep the gossip traffic from overwhelming the network and the peers, the frequency of gossip messages needs to be a random number bound to be inversively proportional to the size of the peer set and proportional to the urgency.
The urgency is defined by the number of changes to the app peer set that have occurred since the last gossip broadcast. This urgency (and thus the broadcast frequency) needs to be re-computed every time the app peer set changes.
(Note: the peer should add itself to the set before calculating the urgency).
Collaboration-level messaging
Now that we have a way of maintaing the peer set for a given collaboration, we need to be able to cross-replicate CRDT instances that the peers are collaborating with.
For this, each peer keeps a collaboration-level hashring, placing all the peer IDs the peer gets from the collaboration membership gossip.
Every time this hashring changes, the peer calculates the Dias-Peer-Set.
For each of the peers in this set:
For each of the remaining peers in the collaboration hashring:
Note: Remember that these connections are at the collaboration level. A peer may be running multiple collaborations at the same time, as it also may be connected to other peers because of app-level gossip. Be wise about this.
Collaboration-level P2P replication protocol
The collaboration-level P2P protocol should be peerformed over a collaboration-specific protocol handler (a collaboration has a unique ID).
Once established, each connection starts at in eager mode. In eager mode, the node replicates eagerly.
When receiving a duplicate operation from a node, that connection is downgraded to lazy, by sending a
PRUNE
message.When in lazy mode, the peer only sends
IHAVE
messages, declaring which the current vector clock.In order to create a cycle-less graph and to recover from faults, we introduce timers on missing operations. When an operation doesn't get here before the timeout, we send a GRAFT message to the source peer, telling it to change that connection to eager mode.
Protocol:
Locals:
vc
: the latest local vector clockpeer connections
: the connection to each peer, with a vector clock associated to iteager
: set of connections in eager-sending morelazy
: set of connections in lazy-sending modeProtocol:
(IHAVE, vc)
message(IHAVE, vc)
message is received, save it in thepeer connections
map.(IHAVE, vc)
message for a message that has not been received yet:GRAFT
) message to the sender.GRAFT
message, the connection is turned into eager-sending mode (remove it fromlazy
and add it toeager
) and all the missing operations are sentPRUNE
messagePRUNE
message, the connection is turned into lazy-sending mode (remove it fromeager
and add it tolazy
)eager
andlazy
lazy
?(IHAVE, new vector clock)
messsageNote: IHAVE messages don't need to be sent immediately when there is a new message available. As an optimization, a node can throttle sending it, and only at the end of a series of operations, does it need to send only one
(IHAVE, vc)
message with only the latest vector clock.Scaling the ws-star server
Use client-side sharding? Using several different ws-star servers, we can select a random ws-star server and connect to that. The problem here is that this hinders peer discovery. At the limit, if, for example, we have 2 ws-star servers, and the app can has 2 peers: if each peer uses a different server, each peers thinks they are alone.
Improving the js-ipfs performance
The text was updated successfully, but these errors were encountered: