-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[narwhal] Use CertificateV2 to speed up narwhal catchup #13985
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
4 Ignored Deployments
|
3800a58
to
d43802f
Compare
78100e2
to
e1435e7
Compare
add new fetch mechanism to only verify tips of cert chain and only serve verified certificates
e1435e7
to
1138954
Compare
@@ -1533,6 +1534,7 @@ impl ProtocolConfig { | |||
if chain != Chain::Mainnet && chain != Chain::Testnet { | |||
cfg.feature_flags.enable_effects_v2 = true; | |||
} | |||
cfg.feature_flags.narwhal_certificate_v2 = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's guard this feature to not launch on mainnet or testnet first. We can launch on a subsequent PR after more testing.
narwhal/primary/src/primary.rs
Outdated
@@ -962,6 +979,23 @@ impl PrimaryToPrimary for PrimaryReceiverHandler { | |||
.map_err(|e| anemo::rpc::Status::from_error(Box::new(e)))? | |||
{ | |||
Some(cert) => { | |||
if self.protocol_config.narwhal_certificate_v2() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipping serving a certificate may make subsequent certificates miss parents. We need to ensure all certificates served as fetch result can be verified directly or indirectly. And the optimization is to skip direct verifications when possible on the fetcher side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed switching to serve ALL certificates in store and leave it up to the fetcher to verify.
// fail verification will cancel processing for all fetched certs. | ||
let mut leaf_certs = Vec::new(); | ||
for (idx, c) in all_certificates.iter_mut().enumerate() { | ||
if !all_parents.contains(&c.digest()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not enough to make sure all certificates have parents available in store or among other fetched certificates. e.g. a fetch certificate can exist in all_parents, but itself is missing some parents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that is all checked in try_accept_fetched_certificate
so it is okay, right? Only thing we are skipping is the signature verification.
- Only deploy to devnet for now and allow time to soak - Serve all certificates that are written to store (Directly or Indirectly verified) - Store signature bytes of IndirectlyVerified certs - Add assertion to only accept verified certs
c2488ab
to
a7ca0d6
Compare
Description
PR#13777 introduces new
SignatureVerificationState
that can now be used in certificate fetching to only verify the tip of certificate chains and ensure that the verification state is reflected in storage. Doing so should save us time in signature verification for a node that is trying to catchup.Test Plan
Added unit tests. Catchup tests in private-testnet.
Results
NW Catchup Rate @ 200 TPS w/ 2 hours of downtime
NW Catchup Rate @ 5K TPS w/ 1 hour of downtime
Known Issues to be investigated/fixed in follow up PRs
Narwhal catchup only hits full potential after state sync completes
Execution bottleneck at high TPS preventing higher NW catchup rate
If your changes are not user-facing and not a breaking change, you can skip the following section. Otherwise, please indicate what changed, and then add to the Release Notes section as highlighted during the release process.
Type of Change (Check all that apply)
Release notes
Protocol upgrade to version 28 which will enable the use of CertificateV2 in narwhal that will be used to speed up processing of certificates during certificate fetching/catchup.