Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Add a Statement Gossip Subsystem section to the implementors guide #1151

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 85 additions & 1 deletion roadmap/implementors-guide/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ There are a number of other documents describing the research in more detail. Al
* [Subsystems](#Subsystems)
* [Overseer](#Overseer)
* [Candidate Backing](#Candidate-Backing-Subsystem)
* [Statement Gossip](#Statement-Gossip-Subsystem)
* [Gossip Protocols](#Gossip-Protocols)
* [Data Structures and Types](#Data-Structures-and-Types)
* [Glossary / Jargon](#Glossary)

Expand Down Expand Up @@ -989,12 +991,78 @@ Dispatch a `PovFetchSubsystemMessage(relay_parent, candidate_hash, sender)` and

(TODO: send statements to Statement Distribution subsystem, handle shutdown signal from candidate backing subsystem)

### Statement Gossip Subsystem

The statement gossip subsystem is responsible for gossiping out and receiving [Signed Statements](#Signed-statement-type) from validators.

#### Message Types

This subsystem communicates with the overseer via [two message types](#Candidate-Backing-Subsystem-Messages), one for overseer -> subsystem and another for subsystem -> overseer. Currently this just differentiates between statements that the subsystem has been instructed to gossip out, and statements that the subsystem has received.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This subsystem communicates with the overseer via [two message types](#Candidate-Backing-Subsystem-Messages), one for overseer -> subsystem and another for subsystem -> overseer. Currently this just differentiates between statements that the subsystem has been instructed to gossip out, and statements that the subsystem has received.
This subsystem communicates with the overseer via [two message types](#candidate-backing-subsystem-message), one for overseer -> subsystem and another for subsystem -> overseer. Currently this just differentiates between statements that the subsystem has been instructed to gossip out, and statements that the subsystem has received.

Did you meant to reference the singular?


#### Functionality

The subsystem needs to contain a handle to a gossiping engine to gossip and recieve messages. This will be cloned into each job that is spawned. Apart from that, it also contains the general structures that all subsystems contain, e.g. channels to communicate with the overseer and handles to spawned jobs in order to shut them down.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be cloned into each job that is spawned.

Why clone the GossipEngine for each job? GossipEngine does not implement Clone by design, thus you would need to wrap it in an Arc<Mutex<>>. The statement gossip subsystem will run as a future, correct? If so wrapping a GossipEngine in a Mutex will prevent you from using async/await as (1) it is very difficult to make sure the lock is only held for short amount of time and definitely not while being preempted / sleeping and (2) a MutexGuard is not Send which would be required for async/await.

Does the above make sense? I am happy to expand further on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, it seems reasonable for the gossip engine to have a single owner (the subsystem). A job for this subsystem would only entail stringing together gossip_messages_for(specific_parent_hash_topic) and sending a message to the other subsystem.


On `OverseerSignal::StartWork` it should:
* Spawn a job and pass in `relay_parent`, a clone of the gossiping engine and a handle to a message validator.
* Send out a neighbour packet with the `relay_parent` added to the list of accepted chain heads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you agree with me that the neighbor packet should be sent via the GossipEngine and not via the Validator, would you mind adding a note here?

If there is no agreement I can expand on why this would help the architecture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we need to be aware of is that the validator needs to be aware of the most recent neighbor packets we have sent to peers. I suppose we just need to be careful when the GossipEngine is polled when doing that update.


On `OverseerSignal::StopWork` it should:
* Stop the job via the job handle.
* Send out a neighbour packet with the job's `relay_parent` removed.

On `StatementGossipSubsystemMessageIn::StatementToGossip` it should:
* Send the signed statement to the job running for the `relay_parent`.

The statement gossip job needs to poll two seperate futures (as well as the exit signal):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make the statement gossip job a pure state machine that gets events as input and outputs commands to either be forwarded to the GossipEngine or to the Overseer?

Thus business logic (Polkadot specific) would stay pure within a single state machine and I/O logic would stay within the Subsystem.


* A future that takes the passed-in statements, validates them and gossips them along with the `relay_parent` hash using the gossip engine.
* A future that takes the messages that the gossip engine receives for the `relay_parent`, validates them and sends to the subsystem.

#### Message Validation

The following conditions needs to be met for an outgoing statement to be sent:

* We must have 'peer knowledge' of what canidates the receiver knows about.
* If the statement is `Valid` or `Invalid`, the receiver of the statement must know about the candidate that the hash refers to.
* The statement signature must be valid.

The following conditions need to be met for an incoming statement to be accepted:

* The sender of the statement has to be in the validator set for the relay block.
* If the statement is `Valid` or `Invalid`, then the candidate that the hash refers to must be known.
* As only `MAX_CHAIN_HEADS` (currently set to 5) competing parachain heads are accepted at a time, if the statement is `Seconded`, then there must be less than `MAX_CHAIN_HEADS` other candidate receipts for the parachain block.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then there must be less than MAX_CHAIN_HEADS other candidate receipts for the parachain block.

This is not accurate - max chain heads is about how many relay chain heads a particular peer is allowed to say they are interested in, not about how many candidates there are for any specific relay chain head.

* The statement signature must be valid.

I have a basic implementation of this code on the [`ashley-test-statement-gossip-subsystem`](https://github.com/paritytech/polkadot/tree/ashley-test-statement-gossip-subsystem) branch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you would like to have input on this, would you mind opening up a pull request? That would make discussions a lot easier.


(TODO: Do we need a message type for sending a statemen directly to a peer?)
(TODO: We probably need to account for backpressure)

---

[TODO: subsystems for gathering data necessary for block authorship, for networking, for misbehavior reporting, etc.]

----

## Gossip Protocols

A lot of a relay chain's communication happens via gossiping messages. To avoid this becoming an attack vector, we want to use [Bounded Gossip Protocols](https://github.com/w3f/research/blob/master/docs/polkadot/networking/0-overview.md#bounded-gossip-protocols).

### Functionality

Several components make up a gossiping system:

* A network implementation, that contains things like the actual connections to peers.
* A propagation pool of messages.

When the node wants to gossip a message:
* The message is put into the propagation pool and stays there until cleaned up.
* The message is sent out to the nodes that it is currently connected to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this is not accurate and does not capture all of the aspects of bounded gossip protocols. We only send out messages to peers that pass the filtration criterion for that peer. And then later on we may "unlock" those messages once we are aware of the filtration criterion for that peer changing.


When a new node connects to our node:
* All messages in the propagation pool are sent to it, and we receive all of its propagation pool messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - we only send messages to a peer that pass the filtration-criterion allowed_k. At this level of detail, we can paraphrase that as a function is_message_allowed(peer, msg) which gossip protocols need to implement


## Data Structures and Types

[TODO]
Expand Down Expand Up @@ -1072,7 +1140,7 @@ enum OverseerSignal {
```


#### Candidate Backing subsystem Message
#### Candidate Backing Subsystem Message

```rust
enum CandidateBackingSubsystemMessage {
Expand All @@ -1085,6 +1153,22 @@ enum CandidateBackingSubsystemMessage {
}
```

### Statement Gossip Subsystem Messages

```rust
enum StatementGossipSubsystemIn {
/// A statement to be gossiped out to validators.
StatementToGossip { relay_parent: Hash, statement: SignedStatement }
}
```

```rust
enum StatementGossipSubsystemOut {
/// A statement that we've received from a validator.
StatementReceived { relay_parent: Hash, statement: SignedStatement }
}
```

#### Host Configuration

The internal-to-runtime configuration of the parachain host. This is expected to be altered only by governance procedures.
Expand Down