ADR: Block Propagation During Consensus #423

evan-forbes · 2021-06-18T01:41:42Z

Summary

TLDR; We have three things to decide/figure out, in order. 1) What should the PartSetHeader commit to 2) Should we include erasure data while gossiping 3) How do the answers of the first two decisions affect the spec?

Then we should write an ADR based on those decisions

Details

Now that we are planning on propagating block data via the normal tendermint gossip mechanism and not via IPFS, from an implementation perspective, I think it would be unwise to get rid of the PartSet and PartSetHeader. Both play a crucial role in how the consensus and blockchain reactors gossip block data.

The question that I think we should be asking is, “what should the PartSet and PartSetHeader commit to”? Do we only need to commit to block data (transactions, evidence, ISRs, and messages) in the form of shares, or do we need to commit to the entire types.Block protobuf encoding? After we decide that, we should decide on whether or not erasure data should be propagated along with the necessary block data. Lastly, how should this affect the spec?

Decision 1: What should the PartSetHeader commit to?

Option A: We only need to commit to Block Data

If we only need to commit to the block data and not the entire block encoding, then this is the equivalent of replacing the PartSetHeader with the DataAvailabilityHeader. Both the DataAvailabilityHeader and PartSetHeader would commit to the same thing, the DataHash, and we would use shares for what used to be “parts”. The overall concept would remain the same in the reactors, we would gossip Part s, which all have a proof to the PartSetHeader.hash, except we use the individual shares with merkle proofs to the DataHash instead of making a hash specifically for some number of parts.

We would effectively be co-opting the PartSetHeader to commit to the block data in the form that is used in the DataAvailabilityHeader. Fwiw, option A is my preferred option, as using the DataHash as the same commitment for both the DataAvailabilityHeader and the PartSetHeader makes a lot of sense. I also feel like it aligns well with our past month of work.

Lastly I think it gives us a lot of options from a future design perspective. Since we’re still using shares and the DataHash, depending on decision 2, we could even stop collecting gossiped data after we have enough shares to repair the extended data square.

Option B: We need to commit to the entire protobuf encoding of the block

If we need the PartSetHeader to commit the entire protobuf encoding of the Block, then this is even easier. AFAICT, we would simply change a few things in the Block proto types and methods, then convert the block.Data into a data square, flatten that square into a [][]byte, and we’re done.

It is simpler, but this would mean that we are committing to something else other than the DataAvailabilityHeader during consensus, which could be confusing. This approach would offer complete protection against DOS attacks mentioned by John, where a peer could just be feeding another junk data, over everything in the block. This might not be necessary in consensus, as the rest of the data needed to reconstruct the Block struct can come from the Proposal (mentioned by Ismail). However, in the blockchain reactor that might not be the case. When peers are catching up on blocks, we might need some commitment to the rest of the data in the block. It should be noted that I still need to do more investigation into this area, as I’m still unsure.

Decision 2: Should we propagate erasure data?

Option A: Include Erasure Data in Block Data

Include the erasure data in the data that is gossiped. More bandwidth, less compute.

Option B: Only include the original data in the square

Don’t include the erasure data that is gossiped. Less bandwidth, more compute.

Decision 3: How should this affect the spec?

IMHO, we should not waste this opportunity to refine our spec -> implementation workflow. How do the decisions we made in decision 1 and 2 affect the spec? Should the ADR that we eventually open also include a PR to the spec? Does the spec even need to be changed?

Side note: It’s also worth noting that the data that we end up propagating will need to be in the data square format. Quoting John, “Note that "original block data" means the ODS. It still should be in that form (instead of a list of txs, messages, etc.) because some padding shares have namespace information that can't necessarily be inferred from just the list of txs, messages, etc.”

References and Context

please feel free to edit and add refs and context here as we see fit

The text was updated successfully, but these errors were encountered:

adlerjohn · 2021-06-18T01:50:47Z

or do we need to commit to the entire types.Block struct?

One thing that's missing here is: what do you mean by "the entire types.Block struct"? Do you mean these fields?

https://github.com/celestiaorg/lazyledger-core/blob/7d4c1e966522425f35b4d980e9f0f746556a9768/proto/tendermint/types/block.proto#L9-L14

evan-forbes · 2021-06-18T01:52:03Z

One thing that's missing here is: what do you mean by "the entire types.Block struct"? Do you mean these fields?

yeah, sorry that is unclear. The entire protobuf represenation of the block. updated in most recent edit

liamsi · 2021-06-18T10:54:10Z

Yeah, @adlerjohn, the whole Block (including the header and all other fields) is just serialized, then chunked and gossiped, currently.

In case we'd go with the DAHeader only approach, this would need to change as Evan summarized. We could also change the gossiping as follows: For Block.Data gossip shares (similar to currently with parts); Header, Commit and DAHeader could be either merged into the Proposal itself or, they could be gossiped separately but even before starting to gossip Block.Data (so validators could make a decision a tiny bit faster; i.e. they could reject a block without even caring about Block.Data if a proposer screws up somehow).

Also, wanted to add two notes:

depending on decision 2, we could even stop collecting gossiped data after we have enough shares to repair the extended data square.

Just noting that from a network bandwidth perspective dropping the remaining messages won't change anything (they will still get sent/pushed to you; as you do not request/collect them, they will still be broadcasted to you. you can only drop the superfluous messages). Changing to only sending a part of the EDS to peers might require further analysis.

Another consideration if we switched from Parts to Shares: currently the NMT is not optimized for providing a proof for single leaves quickly: https://github.com/celestiaorg/nmt/blob/6d5b99ddb71ae8f8c11d7a5fedc3b84532e949a7/nmt.go#L115-L129
That is because we didn't need this functionality yet: on IPFS proof nodes get stored on the network and they are downloaded on demand when downloading any data. Here, we are pushing out data (again, this is not just semantics). Hence, we have to provide a proof per share when gossiping. That said, we should at least verify that we can use this API (but I don's see what not) and then optimize later.

adlerjohn · 2021-06-18T12:24:38Z

Another consideration if we switched from Parts to Shares: currently the NMT is not optimized for providing a proof for single leaves quickly

An idea I got from @Wondertan just now: instead of splitting up into shares...split up into rows. No one said you have to replace parts with shares. Rows don't need any proof data, and the proofs can be re-computed locally. Is there anything stopping the use of rows here (e.g. maybe a maximum network message size)?

Wondertan · 2021-06-18T12:27:24Z

@adlerjohn, max msg size in consensus reactor is set to 1MB, but it can be adjusted

adlerjohn · 2021-06-18T12:27:48Z

If it can be adjusted, then it looks like there's no problem!

Wondertan · 2021-06-18T12:29:58Z

https://github.com/celestiaorg/lazyledger-core/blob/master/consensus/reactor.go#L32

liamsi · 2021-06-19T00:23:37Z

An idea I got from @Wondertan just now: instead of splitting up into shares...split up into rows.

Yeah, that has already been part of this consideration here:
> granularity of the gossiped data (shares, rows)?

Rows don't need any proof data, and the proofs can be re-computed locally.

If you send the whole row, and only the first half/the ODS part, you would run the erasure coding for that row to verify that the row you got isn't garbage. This should be carefully considered: could introduce a new dos vector? Could we instead send a single proof node for the 2nd half instead of re-computing the erasure coding locally instead?

Is there anything stopping the use of rows here (e.g. maybe a maximum network message size)?

Not really. I could imagine that this kind of simple push-based gossiping works much better / reliably with smaller network messages (not sure though). If we keep the chunk size roughly the same, we have empirical proof of how gossiping behaves in practice. Do we have the same assurance for larger messages (rows)?

If we go down the route of gossiping rows, we should spin up realistic networks to see if that changed anything substantially in terms of propagation time etc.

Wondertan · 2021-06-19T09:37:22Z

Yeah, that has already been part of this consideration here

Yeah, that's your idea indeed.

could introduce a new dos vector?

How? The case with rows is quite similar to parts, except for 2x size amplification on verification, but that should be negligible, even for the max block size Jonh mentioned.
(I do not consider 1GB block size case, as I don't consider current block propagation solution as final)

Could we instead send a single proof node for the 2nd half instead of re-computing the erasure coding locally instead?

I think we can start from just sending just half of the rows and recompute. Later on, sending proof for the 2nd part can be applied on top of considered valuable.

If we go down the route of gossiping rows, we should spin up realistic networks to see if that changed anything substantially in terms of propagation time etc.

We would need realistic testing anyway in case we change block propagation somehow. Even more for the case with broadcasting shares instead of rows, as they differ from parts broadcasting much more than rows.

Wondertan · 2021-07-07T15:03:38Z

Closing this as ADR is fully-fledged out in #434
We can reopen that if necessary

Wondertan mentioned this issue Jun 21, 2021

Preliminary introduction of RowSet with supporting changes #427

Closed

10 tasks

liamsi assigned Wondertan Jun 23, 2021

This was referenced Jun 24, 2021

Add the Data Availability Header to BlockID #312

Closed

ADR 006: Block Propagation with Rows #434

Merged

liamsi mentioned this issue Jun 25, 2021

Remove the PartSetHeader where possible #418

Closed

3 tasks

evan-forbes mentioned this issue Jun 25, 2021

Investigate time to consensus for really large block sizes #438

Closed

3 tasks

adlerjohn mentioned this issue Jun 27, 2021

Codify the decision that validators will download all block data celestiaorg/celestia-specs#180

Closed

Wondertan closed this as completed Jul 7, 2021

evan-forbes mentioned this issue Aug 16, 2021

Decouple Providing from PutBlock celestiaorg/celestia-node#12

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR: Block Propagation During Consensus #423

ADR: Block Propagation During Consensus #423

evan-forbes commented Jun 18, 2021 •

edited

Loading

adlerjohn commented Jun 18, 2021

evan-forbes commented Jun 18, 2021 •

edited

Loading

liamsi commented Jun 18, 2021 •

edited

Loading

adlerjohn commented Jun 18, 2021

Wondertan commented Jun 18, 2021

adlerjohn commented Jun 18, 2021

Wondertan commented Jun 18, 2021

liamsi commented Jun 19, 2021 •

edited

Loading

Wondertan commented Jun 19, 2021

Wondertan commented Jul 7, 2021

ADR: Block Propagation During Consensus #423

ADR: Block Propagation During Consensus #423

Comments

evan-forbes commented Jun 18, 2021 • edited Loading

Summary

Details

Decision 1: What should the PartSetHeader commit to?

Option A: We only need to commit to Block Data

Option B: We need to commit to the entire protobuf encoding of the block

Decision 2: Should we propagate erasure data?

Option A: Include Erasure Data in Block Data

Option B: Only include the original data in the square

Decision 3: How should this affect the spec?

References and Context

adlerjohn commented Jun 18, 2021

evan-forbes commented Jun 18, 2021 • edited Loading

liamsi commented Jun 18, 2021 • edited Loading

adlerjohn commented Jun 18, 2021

Wondertan commented Jun 18, 2021

adlerjohn commented Jun 18, 2021

Wondertan commented Jun 18, 2021

liamsi commented Jun 19, 2021 • edited Loading

Wondertan commented Jun 19, 2021

Wondertan commented Jul 7, 2021

evan-forbes commented Jun 18, 2021 •

edited

Loading

evan-forbes commented Jun 18, 2021 •

edited

Loading

liamsi commented Jun 18, 2021 •

edited

Loading

liamsi commented Jun 19, 2021 •

edited

Loading