-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add a Statement Gossip Subsystem section to the implementors guide #1151
Conversation
@rphmeier What sort of content should go in the 'Gossip Protocols' section? I don't want to rewrite what's in the research documents so I think I'll just mention and link that we're using 'Bounded Gossip Protocols' and give a brief overview of a propagation pool. |
I think you might have mentioned this on the call on friday, but what is this again? |
OK, that seems reasonable. I'd recommend summarizing topics and the components of gossip protocols for a) validating messages and b) deciding when to propagate messages to other peers, and then describe all gossip protocols in terms of the messages they are in charge of, their neighbor packet description, and their implementations of those functionalities. More or less detail is fine, up to your judgement, as long as the guide doesn't leave ambiguity. For chain heads, we've currently used 5 as a maximum, although in practice it will not be more than 2 except in bizarre circumstances. |
cc @infinity0 What's your plan with these old networking documents? Can we expect the links to remain stable? |
@rphmeier I recently (~2 weeks ago) restructured some of the pages, so any links from before that would already have been broken. I'm not planning another restructure for a while, but I do intend to revamp the content at some point. |
@@ -1030,6 +1031,7 @@ The following conditions need to be met for an incoming statement to be accepted | |||
|
|||
* The sender of the statement has to be in the validator set for the relay block. | |||
* If the statement is `Valid` or `Invalid`, then the candidate that the hash refers to must be known. | |||
* As only `MAX_CHAIN_HEADS` (currently set to 5) competing parachain heads are accepted at a time, if the statement is `Seconded`, then there must be less than `MAX_CHAIN_HEADS` other candidate receipts for the parachain block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then there must be less than
MAX_CHAIN_HEADS
other candidate receipts for the parachain block.
This is not accurate - max chain heads is about how many relay chain heads a particular peer is allowed to say they are interested in, not about how many candidates there are for any specific relay chain head.
|
||
When the node wants to gossip a message: | ||
* The message is put into the propagation pool and stays there until cleaned up. | ||
* The message is sent out to the nodes that it is currently connected to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this is not accurate and does not capture all of the aspects of bounded gossip protocols. We only send out messages to peers that pass the filtration criterion for that peer. And then later on we may "unlock" those messages once we are aware of the filtration criterion for that peer changing.
* The message is sent out to the nodes that it is currently connected to. | ||
|
||
When a new node connects to our node: | ||
* All messages in the propagation pool are sent to it, and we receive all of its propagation pool messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - we only send messages to a peer that pass the filtration-criterion allowed_k
. At this level of detail, we can paraphrase that as a function is_message_allowed(peer, msg)
which gossip protocols need to implement
|
||
#### Message Types | ||
|
||
This subsystem communicates with the overseer via [two message types](#Candidate-Backing-Subsystem-Messages), one for overseer -> subsystem and another for subsystem -> overseer. Currently this just differentiates between statements that the subsystem has been instructed to gossip out, and statements that the subsystem has received. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This subsystem communicates with the overseer via [two message types](#Candidate-Backing-Subsystem-Messages), one for overseer -> subsystem and another for subsystem -> overseer. Currently this just differentiates between statements that the subsystem has been instructed to gossip out, and statements that the subsystem has received. | |
This subsystem communicates with the overseer via [two message types](#candidate-backing-subsystem-message), one for overseer -> subsystem and another for subsystem -> overseer. Currently this just differentiates between statements that the subsystem has been instructed to gossip out, and statements that the subsystem has received. |
Did you meant to reference the singular?
|
||
#### Functionality | ||
|
||
The subsystem needs to contain a handle to a gossiping engine to gossip and recieve messages. This will be cloned into each job that is spawned. Apart from that, it also contains the general structures that all subsystems contain, e.g. channels to communicate with the overseer and handles to spawned jobs in order to shut them down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be cloned into each job that is spawned.
Why clone the GossipEngine
for each job? GossipEngine
does not implement Clone
by design, thus you would need to wrap it in an Arc<Mutex<>>
. The statement gossip subsystem will run as a future, correct? If so wrapping a GossipEngine
in a Mutex
will prevent you from using async/await
as (1) it is very difficult to make sure the lock is only held for short amount of time and definitely not while being preempted / sleeping and (2) a MutexGuard
is not Send
which would be required for async/await
.
Does the above make sense? I am happy to expand further on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, it seems reasonable for the gossip engine to have a single owner (the subsystem). A job for this subsystem would only entail stringing together gossip_messages_for(specific_parent_hash_topic)
and sending a message to the other subsystem.
|
||
On `OverseerSignal::StartWork` it should: | ||
* Spawn a job and pass in `relay_parent`, a clone of the gossiping engine and a handle to a message validator. | ||
* Send out a neighbour packet with the `relay_parent` added to the list of accepted chain heads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you agree with me that the neighbor packet should be sent via the GossipEngine
and not via the Validator
, would you mind adding a note here?
If there is no agreement I can expand on why this would help the architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we need to be aware of is that the validator needs to be aware of the most recent neighbor packets we have sent to peers. I suppose we just need to be careful when the GossipEngine
is polled when doing that update.
On `StatementGossipSubsystemMessageIn::StatementToGossip` it should: | ||
* Send the signed statement to the job running for the `relay_parent`. | ||
|
||
The statement gossip job needs to poll two seperate futures (as well as the exit signal): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make the statement gossip job a pure state machine that gets events as input and outputs commands to either be forwarded to the GossipEngine
or to the Overseer
?
Thus business logic (Polkadot specific) would stay pure within a single state machine and I/O logic would stay within the Subsystem
.
* As only `MAX_CHAIN_HEADS` (currently set to 5) competing parachain heads are accepted at a time, if the statement is `Seconded`, then there must be less than `MAX_CHAIN_HEADS` other candidate receipts for the parachain block. | ||
* The statement signature must be valid. | ||
|
||
I have a basic implementation of this code on the [`ashley-test-statement-gossip-subsystem`](https://github.com/paritytech/polkadot/tree/ashley-test-statement-gossip-subsystem) branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you would like to have input on this, would you mind opening up a pull request? That would make discussions a lot easier.
Migrating from w3f/research#89.
Fixes #1142