-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DEPRECATED] split into smaller tasks: Need a proposal for rollback and rotation #2162
Comments
Question on how we should prioritize repairs. Let's say we have all the blocks for slot 1, something like slot 1: [block 0/0]. We then receive a blob for slot 6, block 0/3. It follows that:
|
It would also be good to flesh out how the guard zones fit into this. Does each validators local poh service generate the rest of the ticks in the guard zone for a block after the last tick before the guard zone is received? |
Implies that we have a leader schedule that extends to the slot, if in the future. If not, the blob is dropped, stored until such time as we can chain it? Checked against gossip for current network tick height? |
Use of the word block here to mean a place where a vote can occur, also where a fork can occur? Need a new word for a place where a vote can be placed. |
@carllin I think we need to experiment with repair, but the current slot that the validator is at with its own PoH+/- some range is probably what should be actively repaired for. see Repair above |
@rob-solana a vote can only occur at the end of a slot. A leader could be scheduled for consecutive slots. |
@rob-solana that works! |
ok: slot is vote-able. will update the terminology to reflect. I think that block is still an ungood word, because we use it for so many other things, even in this proposal. |
@rob-solana Yea, because a slot overlaps previous slots. It is a recursively defined structure. So the portion of a slot that is just the virtual PoH that overlaps a slot is what I am calling a block in this proposal. |
@aeyakovenko, can you move this issue into a PR against the book? Pretty awful that these guys are having to use the quote mechanism to provide line-level feedback and that there has been 54 edits to the issue text. |
@aeyakovenko, please review https://solana-labs.github.io/solana/terminology.html. If you want to redefine |
@garious a short justification is that we don't have a term for data transmitted by the leader in its slot that overlaps in PoH previous slots.... So our terminology fails to capture the problem. |
@garious this issue is cross component. Kind of hard to add it to the book. I would love some suggestions on where to place each part. |
@aeyakovenko, I can think through integration just before the PR is merged. Anywhere under proposals is fine for now. The important thing is that this discussion move to a WIP PR. |
@aeyakovenko, are people still using this issue? I can't navigate it at all. Can you move the remaining work to this project: https://github.com/solana-labs/solana/projects/11? |
@garious sounds good. I think a part is covered by entry tree |
@aeyakovenko break this up into a project! |
* ci: use correct repository name for the restriction * ci: don't publish docs temporarily * ci: remove crowdin * relace <br> with <br /> * ci: bumping version
…-labs#2170) ci: update docs pipeline (solana-labs#2162) Co-authored-by: yihau <[email protected]>
Problem
Need a proposal of changed to the code for rollback + rotation
Proposed Solution
Block, Slots, etc.. are confusing terms for us because a leader transmits entries over a larger tick range than just the leaders slot. This proposes the following definitions for the terms.
A leader transmits blocks for their slot. The blocks may overlap previous slots. The last block is the only block that can contain data. For example, the following are all valid transmissions for a single leader:
Block 0/2 and block 1/2 are entries that only contain virtual ticks and span the previous 2 slots in tick height. Block 2/2 contains data.
Block 0/0 only contains data for this leader, and it's PoH connects directly to the previous leaders slot data transmission. The leader knows exactly how many slots it is skipping, so it can transmit the expected number of blocks.
To keep it simple, a blob would specify the range of ticks the leader is transmitting for as well as the blob number for the slot and how many erasure blobs are expected. Block number can be computed from the tick height and the range. A blob should include all the data necessary to place it into the db ledger, including whatever erasure bits we need. Placing a blob into the ledger should be completely context free. Because the blob headers are signed by the current leader, we don't really need to worry about malicious data. The validator will overrun the leaders slot with its PoH and start on the next slot if the data in the slot fails to produce a valid block of data.
Separate Ledger transmission/receiving from Ledger Processing
#2163 #2277
A validator should be able to receive ledger from the network without processing transactions. Each leader slot only has 1 possible leader that can produce blocks for that slot. Upon receiving a blob the ledger DB can quickly decide if its valid or not (signed by the leader), and where it is placed in the slot. This data structure can aggregate blobs until it sees a full complete votable block and then signal the bank. Optimizing this for optimistic verification can be done later.
slot
and the validator should only repair/retransmit data for that slot.Flat Checkpoint Store
#2499 #2497
A single table that contains all the checkpoints and is easy to follow. At any checkpoint, the updated accounts in that checkpoint overlay the previous checkpoint.
block
which this checkpoint is for. This lastid is the id of the block for a specific data transmission by a leader, and not a virtual last id. The tripplet contains the LastId of the previous block that this checkpoint is derived from.Repair
#2442
If the majority of the network fails to observe the data from the leader before the slot expires, it is unlikely to ever be accepted.
Vote Data Consistency
Once a vote is signed and transmitted, it cannot be "undone". If the data is lost locally the local validator is at risk of being slashed. The vote stack is the sequence of votes that have been made by the validator.
Fork Selection
#2289
See fork selection. The replay stage asks db ledger for full complete blocks and processes all the entries at once in parallel. In doing so, it can produce multiple checkpoints, pick the one that maximizes network reward and vote on that one. This operation can keep track of HashMap<Block, BankHash>, as a mapping of blocks to bank hashes that identify a checkpoint in the bank.
Switch to Leader
PoH recorder continuously runs and signals the validator when it reached the height at which this validator should become a leader. At that point, the PoH stream that has been produced is derived from the last voted block by this validator. That voted block has a checkpoint that was created in Fork Selection ^^^^. That is the checkpoint that should be used by the leader.
blocks 0,1...
tag: @rob-solana @carllin @mvines @garious
The text was updated successfully, but these errors were encountered: