[DEPRECATED] split into smaller tasks: Need a proposal for rollback and rotation #2162

aeyakovenko · 2018-12-14T17:37:53Z

Problem

Need a proposal of changed to the code for rollback + rotation

Proposed Solution

Block, Slots, etc.. are confusing terms for us because a leader transmits entries over a larger tick range than just the leaders slot. This proposes the following definitions for the terms.

slot: a range in ticks in which a leader can transmit entries with data. Leader can transmit virtual ticks that are at a lower tick then the start of the slot. A vote can only occur at the end of a slot. A leader could be scheduled for consecutive slots.
block: a group of entries that span in tick height for a full slot
blob: udp packet containing any number of full entries (current definition).

A leader transmits blocks for their slot. The blocks may overlap previous slots. The last block is the only block that can contain data. For example, the following are all valid transmissions for a single leader:

[[block 0/out of 2, slot 50], [block 1/out of 2, slot 50], [block 2 / out of 2, slot 50]]

Block 0/2 and block 1/2 are entries that only contain virtual ticks and span the previous 2 slots in tick height. Block 2/2 contains data.

[block 0/0]

Block 0/0 only contains data for this leader, and it's PoH connects directly to the previous leaders slot data transmission. The leader knows exactly how many slots it is skipping, so it can transmit the expected number of blocks.

To keep it simple, a blob would specify the range of ticks the leader is transmitting for as well as the blob number for the slot and how many erasure blobs are expected. Block number can be computed from the tick height and the range. A blob should include all the data necessary to place it into the db ledger, including whatever erasure bits we need. Placing a blob into the ledger should be completely context free. Because the blob headers are signed by the current leader, we don't really need to worry about malicious data. The validator will overrun the leaders slot with its PoH and start on the next slot if the data in the slot fails to produce a valid block of data.

Separate Ledger transmission/receiving from Ledger Processing

#2163 #2277
A validator should be able to receive ledger from the network without processing transactions. Each leader slot only has 1 possible leader that can produce blocks for that slot. Upon receiving a blob the ledger DB can quickly decide if its valid or not (signed by the leader), and where it is placed in the slot. This data structure can aggregate blobs until it sees a full complete votable block and then signal the bank. Optimizing this for optimistic verification can be done later.

db ledger is indexed by slot, each slot is an array of blocks made up of blobs. slot 1 -> [block 0, block 1....] The last block is the only one with data. All the blocks leading up to the last block contain Ticks. Because of exponential decrease in likelyhood of rollback we shouldn't expect more than 32 blocks. Leaders can be scheduled for consecutive slots, the pipeline is agnostic to this.
blobs contains enough info to derive the slot, block, tick height and erasure bit.
network layer just dumps blobs into the right spot in the db ledger, add enough redundant data to blobs until adding a blob to the ledger is completely context free.
db ledger sends a signal whenever a last block with data in a slot is ready to replay stage, poh verification happens here
validator is concurrently generating a PoH from its own last vote. This height indicates the current network slot and the validator should only repair/retransmit data for that slot.

Flat Checkpoint Store

#2499 #2497
A single table that contains all the checkpoints and is easy to follow. At any checkpoint, the updated accounts in that checkpoint overlay the previous checkpoint.

bank is organized as HashMap<LastId, (LastId, Accounts, LastIds)>. The LastId key is the block which this checkpoint is for. This lastid is the id of the block for a specific data transmission by a leader, and not a virtual last id. The tripplet contains the LastId of the previous block that this checkpoint is derived from.
LastIds can be a bloom filter initialized with the Previous LastId. Designed to drop 1 in 1b. (Stretch goal)
gossip can be used to get approximate network height, and blobs that are within the gossiped height can be placed into the ledger.

Repair

#2442
If the majority of the network fails to observe the data from the leader before the slot expires, it is unlikely to ever be accepted.

Exponential backoff for erasure code percentage. Every leader should double the erasure code % for every partially observed slot.
nodes that are in the minority are basically just trying to catch up to the height that is advertised in gossip. So they should be repairing blobs between their last checkpoint up to the gossip height.

Vote Data Consistency

Once a vote is signed and transmitted, it cannot be "undone". If the data is lost locally the local validator is at risk of being slashed. The vote stack is the sequence of votes that have been made by the validator.

Replay stage decides to vote. The vote is checked into DB for the slot. Upon boot, the validator can examine the bank account state and reconcile it against the DB. Any votes missing from the ledger can be retransmitted. @carllin
Vote program maintains state of the vote stack. See fork selection. This means this program is forked in every checkpoint. Even so, the stacks in each checkpoint will remain consistent, because each subsequent fork occupies a different height and would force the same vote to expire. Before voting, the replay stage iterates through the vote stack instances in all the checkpoints and makes sure that the vote doesn't violate any of the vote stacks. Latest one should be the only one that is necessary to check though but we should write some tests to make sure that is true.

Fork Selection

#2289
See fork selection. The replay stage asks db ledger for full complete blocks and processes all the entries at once in parallel. In doing so, it can produce multiple checkpoints, pick the one that maximizes network reward and vote on that one. This operation can keep track of HashMap<Block, BankHash>, as a mapping of blocks to bank hashes that identify a checkpoint in the bank.

replay stage iterates through its checkpoints and asks the ledger for any new blocks for that checkpoint and creates a new checkpoint
replay stage picks the "best" checkpoint out of the new checkpoints (see fork-selection.md) and votes
vote is gossiped, and sent to the next leader
voted bank is checkpoint
vote triggers restart of PoH recorder. The recorder is reset to generate from the voted LastId
expired checkpionts are unrolled, or dequeued (GC of the bank occurs here)

Switch to Leader

PoH recorder continuously runs and signals the validator when it reached the height at which this validator should become a leader. At that point, the PoH stream that has been produced is derived from the last voted block by this validator. That voted block has a checkpoint that was created in Fork Selection ^^^^. That is the checkpoint that should be used by the leader.

when this validator is the next leader, it transmits all the generated virtual ticks from the last voted block as its blocks 0,1...
leader looks at the crds for any votes that are valid for the above checkpoint that haven't been registered yet.

tag: @rob-solana @carllin @mvines @garious

The text was updated successfully, but these errors were encountered:

carllin · 2018-12-17T21:27:00Z

Question on how we should prioritize repairs. Let's say we have all the blocks for slot 1, something like slot 1: [block 0/0]. We then receive a blob for slot 6, block 0/3. It follows that:

We know based on mapping the tick height to slots, that this means the leader for slot 6 is transmitting virtual ticks for blocks 4 and 5. This also means this leader for slot 6 observed real data for slot 3. Thus do we then prioritize repairs for slot 3? If so, do we start by asking for everything from slot 3, block 0, blob index 0, until we receive the last tick for slot 3, at which point we mark slot 3 as complete?
If we then receive slot 3 block 0/0, we now know that there must be real data for slot 2 as well. Do we then prioritize repairing slot 2 instead (this essentially forms a chain of slots where we have to keep track of a “head” of this chain that needs to be repaired)?
We then receive slot 7, block 0/4. This means that slot 7 is directly chaining to the slot 2 b/c he’s transmitting virtual ticks for slots 3-6. This is also a separate fork from what we’re observing in 1) b/c they do not chain. How do we pick which of these two forks to prioritize repairing (maybe whichever one can chain back to slot 1 first)? Is this fork just a separate head, one of many chains we keep around in the db_ledger?

carllin · 2018-12-17T21:29:15Z

It would also be good to flesh out how the guard zones fit into this. Does each validators local poh service generate the rest of the ticks in the guard zone for a block after the last tick before the guard zone is received?

rob-solana · 2018-12-17T22:46:35Z

A validator should be able to receive ledger from the network without processing transactions. Each leader slot only has 1 possible leader that can produce blocks for that slot. Upon receiving a blob the ledger DB can quickly decide if its valid or not (signed by the leader), and where it is placed in the slot.

Implies that we have a leader schedule that extends to the slot, if in the future. If not, the blob is dropped, stored until such time as we can chain it? Checked against gossip for current network tick height?

rob-solana · 2018-12-17T22:48:03Z

This data structure can aggregate blobs until it sees a full complete votable block and then signal the bank.

Use of the word block here to mean a place where a vote can occur, also where a fork can occur? Need a new word for a place where a vote can be placed.

aeyakovenko · 2018-12-17T23:09:43Z

It would also be good to flesh out how the guard zones fit into this. Does each validators local poh service generate the rest of the ticks in the guard zone for a block after the last tick before the guard zone is received?

@carllin I think we need to experiment with repair, but the current slot that the validator is at with its own PoH+/- some range is probably what should be actively repaired for.

see Repair above

aeyakovenko · 2018-12-17T23:17:00Z

Use of the word block here to mean a place where a vote can occur, also where a fork can occur? Need a new word for a place where a vote can be placed.

@rob-solana a vote can only occur at the end of a slot. A leader could be scheduled for consecutive slots.

aeyakovenko · 2018-12-17T23:18:40Z

Implies that we have a leader schedule that extends to the slot, if in the future. If not, the blob is dropped, stored until such time as we can chain it? Checked against gossip for current network tick height?

@rob-solana that works!

rob-solana · 2018-12-17T23:23:21Z

Use of the word block here to mean a place where a vote can occur, also where a fork can occur? Need a new word for a place where a vote can be placed.

@rob-solana a vote can only occur at the end of a slot. A leader could be scheduled for consecutive slots.

ok: slot is vote-able. will update the terminology to reflect. I think that block is still an ungood word, because we use it for so many other things, even in this proposal.

aeyakovenko · 2018-12-17T23:29:46Z

ok: slot is vote-able. will update the terminology to reflect. I think that block is still an ungood word, because we use it for so many other things, even in this proposal.

@rob-solana Yea, because a slot overlaps previous slots. It is a recursively defined structure. So the portion of a slot that is just the virtual PoH that overlaps a slot is what I am calling a block in this proposal.

garious · 2018-12-17T23:51:45Z

@aeyakovenko, can you move this issue into a PR against the book? Pretty awful that these guys are having to use the quote mechanism to provide line-level feedback and that there has been 54 edits to the issue text.

garious · 2018-12-17T23:54:25Z

@aeyakovenko, please review https://solana-labs.github.io/solana/terminology.html. If you want to redefine block, I'd appreciate a lengthy justification. The justification for the current definition is here: https://solana-labs.github.io/solana/synchronization.html

aeyakovenko · 2018-12-18T00:00:04Z

@garious a short justification is that we don't have a term for data transmitted by the leader in its slot that overlaps in PoH previous slots.... So our terminology fails to capture the problem.

aeyakovenko · 2018-12-18T03:28:13Z

@garious this issue is cross component. Kind of hard to add it to the book. I would love some suggestions on where to place each part.

garious · 2018-12-18T13:56:09Z

@aeyakovenko, I can think through integration just before the PR is merged. Anywhere under proposals is fine for now. The important thing is that this discussion move to a WIP PR.

garious · 2019-01-17T21:13:40Z

@aeyakovenko, are people still using this issue? I can't navigate it at all. Can you move the remaining work to this project: https://github.com/solana-labs/solana/projects/11?

aeyakovenko · 2019-01-18T02:50:49Z

@garious sounds good. I think a part is covered by entry tree

aeyakovenko · 2019-01-24T00:57:11Z

@aeyakovenko break this up into a project!

aeyakovenko · 2019-01-25T14:36:28Z

Covered in
#2163 #2277 #2289 #2442
https://github.com/solana-labs/solana/projects/11

* ci: use correct repository name for the restriction * ci: don't publish docs temporarily * ci: remove crowdin * relace <br> with <br /> * ci: bumping version

…-labs#2170) ci: update docs pipeline (solana-labs#2162) Co-authored-by: yihau <[email protected]>

aeyakovenko changed the title ~~path to rollback + rotation~~ Need a proposal for rollback and rotation Dec 17, 2018

carllin mentioned this issue Dec 20, 2018

Add proposed design for db_ledger #2253

Merged

aeyakovenko mentioned this issue Jan 14, 2019

Need a design of how we deal with multiple checkpoints in the system #2415

Closed

8 tasks

aeyakovenko closed this as completed Jan 25, 2019

aeyakovenko changed the title ~~Need a proposal for rollback and rotation~~ split into smaller tasks: Need a proposal for rollback and rotation Jan 25, 2019

aeyakovenko changed the title ~~split into smaller tasks: Need a proposal for rollback and rotation~~ [DEPRECATED] split into smaller tasks: Need a proposal for rollback and rotation Jan 25, 2019

ruuda pushed a commit to ChorusOne/solana that referenced this issue Jul 24, 2024

v2.0: ci: update docs pipeline (backport of solana-labs#2162) (solana…

a9c5cbd

…-labs#2170) ci: update docs pipeline (solana-labs#2162) Co-authored-by: yihau <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DEPRECATED] split into smaller tasks: Need a proposal for rollback and rotation #2162

[DEPRECATED] split into smaller tasks: Need a proposal for rollback and rotation #2162

aeyakovenko commented Dec 14, 2018 •

edited

Loading

carllin commented Dec 17, 2018 •

edited

Loading

carllin commented Dec 17, 2018

rob-solana commented Dec 17, 2018 •

edited

Loading

rob-solana commented Dec 17, 2018 •

edited

Loading

aeyakovenko commented Dec 17, 2018 •

edited

Loading

aeyakovenko commented Dec 17, 2018

aeyakovenko commented Dec 17, 2018 •

edited

Loading

rob-solana commented Dec 17, 2018

aeyakovenko commented Dec 17, 2018

garious commented Dec 17, 2018

garious commented Dec 17, 2018

aeyakovenko commented Dec 18, 2018

aeyakovenko commented Dec 18, 2018

garious commented Dec 18, 2018

garious commented Jan 17, 2019

aeyakovenko commented Jan 18, 2019

aeyakovenko commented Jan 24, 2019

aeyakovenko commented Jan 25, 2019

[DEPRECATED] split into smaller tasks: Need a proposal for rollback and rotation #2162

[DEPRECATED] split into smaller tasks: Need a proposal for rollback and rotation #2162

Comments

aeyakovenko commented Dec 14, 2018 • edited Loading

Problem

Proposed Solution

Separate Ledger transmission/receiving from Ledger Processing

Flat Checkpoint Store

Repair

Vote Data Consistency

Fork Selection

Switch to Leader

carllin commented Dec 17, 2018 • edited Loading

carllin commented Dec 17, 2018

rob-solana commented Dec 17, 2018 • edited Loading

rob-solana commented Dec 17, 2018 • edited Loading

aeyakovenko commented Dec 17, 2018 • edited Loading

aeyakovenko commented Dec 17, 2018

aeyakovenko commented Dec 17, 2018 • edited Loading

rob-solana commented Dec 17, 2018

aeyakovenko commented Dec 17, 2018

garious commented Dec 17, 2018

garious commented Dec 17, 2018

aeyakovenko commented Dec 18, 2018

aeyakovenko commented Dec 18, 2018

garious commented Dec 18, 2018

garious commented Jan 17, 2019

aeyakovenko commented Jan 18, 2019

aeyakovenko commented Jan 24, 2019

aeyakovenko commented Jan 25, 2019

aeyakovenko commented Dec 14, 2018 •

edited

Loading

carllin commented Dec 17, 2018 •

edited

Loading

rob-solana commented Dec 17, 2018 •

edited

Loading

rob-solana commented Dec 17, 2018 •

edited

Loading

aeyakovenko commented Dec 17, 2018 •

edited

Loading

aeyakovenko commented Dec 17, 2018 •

edited

Loading