Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[narwhal] Use CertificateV2 to speed up narwhal catchup #13985

Merged
merged 2 commits into from
Oct 12, 2023

Conversation

arun-koshy
Copy link
Contributor

@arun-koshy arun-koshy commented Sep 26, 2023

Description

PR#13777 introduces new SignatureVerificationState that can now be used in certificate fetching to only verify the tip of certificate chains and ensure that the verification state is reflected in storage. Doing so should save us time in signature verification for a node that is trying to catchup.

Test Plan

Added unit tests. Catchup tests in private-testnet.

Results

NW Catchup Rate @ 200 TPS w/ 2 hours of downtime

Screenshot 2023-10-10 at 4 25 53 PM

NW Catchup Rate @ 5K TPS w/ 1 hour of downtime

Screenshot 2023-10-10 at 4 26 24 PM

Known Issues to be investigated/fixed in follow up PRs

Narwhal catchup only hits full potential after state sync completes

Screenshot 2023-10-10 at 4 31 21 PM

Execution bottleneck at high TPS preventing higher NW catchup rate

Screenshot 2023-10-10 at 4 33 39 PM


If your changes are not user-facing and not a breaking change, you can skip the following section. Otherwise, please indicate what changed, and then add to the Release Notes section as highlighted during the release process.

Type of Change (Check all that apply)

  • protocol change
  • user-visible impact
  • breaking change for a client SDKs
  • breaking change for FNs (FN binary must upgrade)
  • breaking change for validators or node operators (must upgrade binaries)
  • breaking change for on-chain data layout
  • necessitate either a data wipe or data migration

Release notes

Protocol upgrade to version 28 which will enable the use of CertificateV2 in narwhal that will be used to speed up processing of certificates during certificate fetching/catchup.

@vercel
Copy link

vercel bot commented Sep 26, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-typescript-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 12, 2023 6:14am
4 Ignored Deployments
Name Status Preview Comments Updated (UTC)
explorer ⬜️ Ignored (Inspect) Visit Preview Oct 12, 2023 6:14am
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Oct 12, 2023 6:14am
mysten-ui ⬜️ Ignored (Inspect) Visit Preview Oct 12, 2023 6:14am
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Oct 12, 2023 6:14am

add new fetch mechanism to only verify tips of cert chain and only serve verified certificates
@@ -1533,6 +1534,7 @@ impl ProtocolConfig {
if chain != Chain::Mainnet && chain != Chain::Testnet {
cfg.feature_flags.enable_effects_v2 = true;
}
cfg.feature_flags.narwhal_certificate_v2 = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's guard this feature to not launch on mainnet or testnet first. We can launch on a subsequent PR after more testing.

narwhal/primary/src/aggregators.rs Show resolved Hide resolved
narwhal/primary/src/aggregators.rs Show resolved Hide resolved
@@ -962,6 +979,23 @@ impl PrimaryToPrimary for PrimaryReceiverHandler {
.map_err(|e| anemo::rpc::Status::from_error(Box::new(e)))?
{
Some(cert) => {
if self.protocol_config.narwhal_certificate_v2() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping serving a certificate may make subsequent certificates miss parents. We need to ensure all certificates served as fetch result can be verified directly or indirectly. And the optimization is to skip direct verifications when possible on the fetcher side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed switching to serve ALL certificates in store and leave it up to the fetcher to verify.

// fail verification will cancel processing for all fetched certs.
let mut leaf_certs = Vec::new();
for (idx, c) in all_certificates.iter_mut().enumerate() {
if !all_parents.contains(&c.digest()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not enough to make sure all certificates have parents available in store or among other fetched certificates. e.g. a fetch certificate can exist in all_parents, but itself is missing some parents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is all checked in try_accept_fetched_certificate so it is okay, right? Only thing we are skipping is the signature verification.

- Only deploy to devnet for now and allow time to soak
- Serve all certificates that are written to store (Directly or Indirectly verified)
- Store signature bytes of IndirectlyVerified certs
- Add assertion to only accept verified certs
@arun-koshy arun-koshy force-pushed the ak/use_certificate_v2 branch from c2488ab to a7ca0d6 Compare October 12, 2023 06:13
@arun-koshy arun-koshy changed the title Use CertificateV2 to speed up narwhal catchup [narwhal] Use CertificateV2 to speed up narwhal catchup Oct 12, 2023
@arun-koshy arun-koshy merged commit 966ad9a into main Oct 12, 2023
31 checks passed
@arun-koshy arun-koshy deleted the ak/use_certificate_v2 branch October 12, 2023 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants