Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbond/Deactivate inactive validators #1360

Closed
Tracked by #2006
adrianbrink opened this issue May 12, 2023 · 8 comments · Fixed by #2174
Closed
Tracked by #2006

Unbond/Deactivate inactive validators #1360

adrianbrink opened this issue May 12, 2023 · 8 comments · Fixed by #2174
Assignees
Labels

Comments

@adrianbrink
Copy link
Member

adrianbrink commented May 12, 2023

Implement liveness protection.

In particular, the current idea would be to jail a consensus validator who has not voted on enough blocks within some past interval.

Started implementation here: #981

@karbyshev karbyshev added the PoS label Jul 4, 2023
@tzemanovic tzemanovic added the post-mainnet Don't worry about this yet. label Oct 23, 2023
@cwgoes cwgoes removed the post-mainnet Don't worry about this yet. label Nov 11, 2023
@cwgoes
Copy link
Collaborator

cwgoes commented Nov 11, 2023

@grarco grarco self-assigned this Nov 11, 2023
@brentstone
Copy link
Collaborator

Some initial notes to spec out how to do this specifically in Namada. Please comment.

  • In finalize_block, expose two functions from the PoS crate: record_liveness_data, which takes the req.votes as input and is called every block, and jail_for_liveness, which is called upon a new epoch.
    • Upon a new epoch, jail_for_liveness should be called before record_liveness_data.
    • QQ: in this scheme, jailing should prob occur at the following epoch, otherwise we cannot communicate the validator set change to CometBFT in time. Is this concerning?
    • QQ: is it preferable to call jail_for_liveness every block too?
  • Use a lazy data structure in storage of the form LazyMap<Address, LazyMap<BlockHeight, bool> that records a bool indicating if a validator Address signed the block in some BlockHeight.
    • Use some parameter perhaps in PosParams called max_num_liveness_blocks: u64 (or similar) to remove map elements for which the height < current_height - max_num_liveness_blocks.
    • At the start of a new epoch, clean the data in storage. Remove data for validators that were previously in the consensus set but have now fallen out of it. Persist the data for those that remain in the consensus set. For new validator to the consensus set, the call to record_liveness_data will instantiate their data.
  • record_liveness_data(votes)
    • for each validator in votes,
      • insert (height, vote.signed_last_block) into validator_liveness_data
      • get earliest height in the map, if height < current_heigh - max_num_liveness_blocks, remove the key-val
  • jail_for_liveness
    • for each validator in validator_liveness_data,
      • count the number of true values. Use another PosParam called liveness_protection_threshold: Dec such that if sum(true) / max_num_liveness_blocks < liveness_protection_threshold, then the validator is jailed and the validator set is appropriately updated.
      • if the validator is no longer in the consensus set for the new epoch, remove its data from validator_liveness_data

Not positive if jailing or deactivating is the proper thing to do, or if it even matters. Also please comment on whether we think it is better to call jail_for_liveness upon a new epoch or every block. The downside of calling it upon a new epoch is that we cannot jail/deactivate misbehaving validators until the following epoch, letting them remain for one more whole epoch if they are in consensus.

@adrianbrink
Copy link
Member Author

jail_for_liveness should be a cheap call so I think we can do it every block.

Validators should be jailed but without being slashed and they can unjail themselves.

@adrianbrink
Copy link
Member Author

The threshold for liveness should be something like missed more than 10% of the last 10,000 blocks. That means that the sliding windows is ~12 hours with 5 second blocks, which means that you can be down for about 1.5 hours.

@cwgoes
Copy link
Collaborator

cwgoes commented Nov 11, 2023

  • It would be preferable to call jail_for_liveness every block; this would allow the network to respond more quickly.
  • Your proposed measurement method requires iteration, which I think may be too expensive. We can either change the method to "no signatures in last N blocks" (vs "less than k signatures in last N blocks"), which is less precise but easier to implement, or we can implement the bitfield technique described in the Cosmos SDK code, which basically just keeps a running counter (thereby avoiding the need to iterate over all of the blocks and count the number of true values).

@adrianbrink
Copy link
Member Author

adrianbrink commented Nov 11, 2023

bitfield is good.

we can port the code over from the sdk.

@brentstone
Copy link
Collaborator

brentstone commented Nov 11, 2023

Cool, this sounds good. Chatted with @grarco about what we will do too.

@cwgoes
Copy link
Collaborator

cwgoes commented Nov 11, 2023

Is this really completed already? 😮

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants