-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR: Block Propagation During Consensus #423
Comments
One thing that's missing here is: what do you mean by "the entire |
yeah, sorry that is unclear. The entire protobuf represenation of the block. updated in most recent edit |
Yeah, @adlerjohn, the whole Block (including the header and all other fields) is just serialized, then chunked and gossiped, currently. In case we'd go with the DAHeader only approach, this would need to change as Evan summarized. We could also change the gossiping as follows: For Block.Data gossip shares (similar to currently with parts); Header, Commit and DAHeader could be either merged into the Proposal itself or, they could be gossiped separately but even before starting to gossip Block.Data (so validators could make a decision a tiny bit faster; i.e. they could reject a block without even caring about Block.Data if a proposer screws up somehow). Also, wanted to add two notes:
Just noting that from a network bandwidth perspective dropping the remaining messages won't change anything (they will still get sent/pushed to you; as you do not request/collect them, they will still be broadcasted to you. you can only drop the superfluous messages). Changing to only sending a part of the EDS to peers might require further analysis. Another consideration if we switched from Parts to Shares: currently the NMT is not optimized for providing a proof for single leaves quickly: https://github.com/celestiaorg/nmt/blob/6d5b99ddb71ae8f8c11d7a5fedc3b84532e949a7/nmt.go#L115-L129 |
An idea I got from @Wondertan just now: instead of splitting up into shares...split up into rows. No one said you have to replace parts with shares. Rows don't need any proof data, and the proofs can be re-computed locally. Is there anything stopping the use of rows here (e.g. maybe a maximum network message size)? |
@adlerjohn, max msg size in consensus reactor is set to 1MB, but it can be adjusted |
If it can be adjusted, then it looks like there's no problem! |
Yeah, that has already been part of this consideration here:
If you send the whole row, and only the first half/the ODS part, you would run the erasure coding for that row to verify that the row you got isn't garbage. This should be carefully considered: could introduce a new dos vector? Could we instead send a single proof node for the 2nd half instead of re-computing the erasure coding locally instead?
Not really. I could imagine that this kind of simple push-based gossiping works much better / reliably with smaller network messages (not sure though). If we keep the chunk size roughly the same, we have empirical proof of how gossiping behaves in practice. Do we have the same assurance for larger messages (rows)? If we go down the route of gossiping rows, we should spin up realistic networks to see if that changed anything substantially in terms of propagation time etc. |
Yeah, that's your idea indeed.
How? The case with rows is quite similar to parts, except for 2x size amplification on verification, but that should be negligible, even for the max block size Jonh mentioned.
I think we can start from just sending just half of the rows and recompute. Later on, sending proof for the 2nd part can be applied on top of considered valuable.
We would need realistic testing anyway in case we change block propagation somehow. Even more for the case with broadcasting shares instead of rows, as they differ from parts broadcasting much more than rows. |
Closing this as ADR is fully-fledged out in #434 |
Summary
TLDR; We have three things to decide/figure out, in order. 1) What should the PartSetHeader commit to 2) Should we include erasure data while gossiping 3) How do the answers of the first two decisions affect the spec?
Then we should write an ADR based on those decisions
Details
Now that we are planning on propagating block data via the normal tendermint gossip mechanism and not via IPFS, from an implementation perspective, I think it would be unwise to get rid of the
PartSet
andPartSetHeader
. Both play a crucial role in how the consensus and blockchain reactors gossip block data.The question that I think we should be asking is, “what should the
PartSet
andPartSetHeader
commit to”? Do we only need to commit to block data (transactions, evidence, ISRs, and messages) in the form of shares, or do we need to commit to the entiretypes.Block
protobuf encoding? After we decide that, we should decide on whether or not erasure data should be propagated along with the necessary block data. Lastly, how should this affect the spec?Decision 1: What should the PartSetHeader commit to?
Option A: We only need to commit to Block Data
If we only need to commit to the block data and not the entire block encoding, then this is the equivalent of replacing the
PartSetHeader
with theDataAvailabilityHeader
. Both theDataAvailabilityHeader
andPartSetHeader
would commit to the same thing, theDataHash
, and we would use shares for what used to be “parts”. The overall concept would remain the same in the reactors, we would gossipPart
s, which all have a proof to thePartSetHeader.hash
, except we use the individual shares with merkle proofs to theDataHash
instead of making a hash specifically for some number of parts.We would effectively be co-opting the
PartSetHeader
to commit to the block data in the form that is used in theDataAvailabilityHeader
. Fwiw, option A is my preferred option, as using theDataHash
as the same commitment for both the DataAvailabilityHeader and thePartSetHeader
makes a lot of sense. I also feel like it aligns well with our past month of work.Lastly I think it gives us a lot of options from a future design perspective. Since we’re still using shares and the
DataHash
, depending on decision 2, we could even stop collecting gossiped data after we have enough shares to repair the extended data square.Option B: We need to commit to the entire protobuf encoding of the block
If we need the
PartSetHeader
to commit the entire protobuf encoding of the Block, then this is even easier. AFAICT, we would simply change a few things in theBlock
proto types and methods, then convert theblock.Data
into a data square, flatten that square into a[][]byte
, and we’re done.It is simpler, but this would mean that we are committing to something else other than the
DataAvailabilityHeader
during consensus, which could be confusing. This approach would offer complete protection against DOS attacks mentioned by John, where a peer could just be feeding another junk data, over everything in the block. This might not be necessary in consensus, as the rest of the data needed to reconstruct theBlock
struct can come from theProposal
(mentioned by Ismail). However, in the blockchain reactor that might not be the case. When peers are catching up on blocks, we might need some commitment to the rest of the data in the block. It should be noted that I still need to do more investigation into this area, as I’m still unsure.Decision 2: Should we propagate erasure data?
Option A: Include Erasure Data in Block Data
Include the erasure data in the data that is gossiped. More bandwidth, less compute.
Option B: Only include the original data in the square
Don’t include the erasure data that is gossiped. Less bandwidth, more compute.
Decision 3: How should this affect the spec?
IMHO, we should not waste this opportunity to refine our spec -> implementation workflow. How do the decisions we made in decision 1 and 2 affect the spec? Should the ADR that we eventually open also include a PR to the spec? Does the spec even need to be changed?
Side note: It’s also worth noting that the data that we end up propagating will need to be in the data square format. Quoting John, “Note that "original block data" means the ODS. It still should be in that form (instead of a list of txs, messages, etc.) because some padding shares have namespace information that can't necessarily be inferred from just the list of txs, messages, etc.”
References and Context
please feel free to edit and add refs and context here as we see fit
The text was updated successfully, but these errors were encountered: