-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[narwhal] Add CertificateV2 #13777
[narwhal] Add CertificateV2 #13777
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Ignored Deployments
|
f61e882
to
b905d20
Compare
narwhal/types/src/primary.rs
Outdated
} | ||
|
||
/// Verifies the validity of the certificate. | ||
/// TODO: Output a different type, similar to Sui VerifiedCertificate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mwtian was thinking about removing this TODO as will have this distinction with AggregateSignatureState
. Lmk what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this TODO. I was hoping using an enum to represent signature and verification state would make migration easier. But there seem to be other issues. In particular, I think we still should not make verify()
take a &mut self
. How about:
- From
verify()
and alsosanitize()
, return aCertificate
where the verification state isVerifiedDirectly
forCertificateV2
- Add a
verify_indirectly()
function, that takes in a child certificate, checks the child is in a verified state and this certificate is one of its parent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, verify()
and sanitize()
will take self
by value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the code so that the cert will not be modified in place and create a new cert. What is the reason why this is better than modifying the certificate in place? Is it to protect against data races due to concurrent access of the referenced certs?
I will leave out adding verify_indirectly() for now, I verify certs indirectly in the follow up PR where this method does not seem needed. But after you review that PR we can revisit if this is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it is very hard to migrate to returning a different Certificate
type from sanitize()
and verify()
, but we can use a similar semantics for these functions. Usually functions named sanitize()
and verify()
should not mutate the underlying data.
narwhal/types/src/primary.rs
Outdated
VerifiedViaLeader(AggregateSignatureBytes), | ||
VerifiedDirectly(AggregateSignatureBytes), | ||
Unverified(AggregateSignatureBytes), | ||
Unsigned(AggregateSignatureBytes), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add some comments to explain what every state means here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also I see the AggregateSignatureState
actually representing the certificate's verification state, rather than the aggregate signature it self. So maybe (just a proposal) something like VerificationState / CertificateVerificationState
would be more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided on AggregateSignatureVerificationState
, let me know what you think
From a first pass this looks great @arun-koshy (referring to the follow up PR as well) ! Do you have any metrics that show how much we verify/save compared to before from your tests? Just to confirm the expectation /theory here |
702937f
to
36404cf
Compare
a24915b
to
ed1d4b9
Compare
ed1d4b9
to
28d1242
Compare
} | ||
|
||
/// This function requires that certificate was verified against given committee | ||
pub fn signed_authorities(&self, committee: &Committee) -> Vec<PublicKey> { | ||
match self { | ||
Certificate::V1(certificate) => certificate.signed_authorities(committee), | ||
Certificate::V2(certificate) => certificate.signed_authorities(committee), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think method implementation like
match self {
Certificate::V1(certificate) => certificate.signed_authorities(committee),
Certificate::V2(certificate) => certificate.signed_authorities(committee),
}
Can be avoided by having signed_authorities()
in CertificateAPI
, and implementing the method separately for both V1 and V2. Is this feasible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be misunderstanding your suggestion, but I do have signed_authorities() in CertificateAPI. I need the match arm within the methods of Certificate to dispatch calls to the correct variants methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature of signed_authorities()
look different then the signature here, so I'm not sure how it is working right now. But I believe you don't need to implement signed_authorities()
in impl Certificate
with a match statement, as long as the method is implement properly for V1 and V2. The match statement will be generated by the #[enum_dispatch(CertificateAPI)]
. We can follow up in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the confusion now, there are two methods with the same name but slightly different signatures. One is just a getter on the signed_authorities bitmap field this is part that the enum_dispatch will take care of. The other is the method that is part of the Certificate impl which is going from the bitmap to the public keys using the committee that is passed in.
narwhal/types/src/primary.rs
Outdated
} | ||
|
||
/// Verifies the validity of the certificate. | ||
/// TODO: Output a different type, similar to Sui VerifiedCertificate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this TODO. I was hoping using an enum to represent signature and verification state would make migration easier. But there seem to be other issues. In particular, I think we still should not make verify()
take a &mut self
. How about:
- From
verify()
and alsosanitize()
, return aCertificate
where the verification state isVerifiedDirectly
forCertificateV2
- Add a
verify_indirectly()
function, that takes in a child certificate, checks the child is in a verified state and this certificate is one of its parent
?
async fn process_certificate_internal( | ||
&self, | ||
certificate: Certificate, | ||
mut certificate: Certificate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is mut
necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could just rebind the value in certificate if sanitize is true but it seemed clearer to indicate to the caller that this method may modify the passed in certificate and the comment above the method explains in which case this will happen.
@@ -745,12 +752,13 @@ impl Synchronizer { | |||
let (sender, receiver) = oneshot::channel(); | |||
self.inner | |||
.tx_certificate_acceptor | |||
.send((certificate, sender, early_suspend)) | |||
.send((certificate.clone(), sender, early_suspend)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be useful to assert that the certificate is in a verified state at the receiving end of rx_certificate_acceptor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion, I will make sure to do this on the followup PR where we use CertificateV2
} | ||
|
||
/// This function requires that certificate was verified against given committee | ||
pub fn signed_authorities(&self, committee: &Committee) -> Vec<PublicKey> { | ||
match self { | ||
Certificate::V1(certificate) => certificate.signed_authorities(committee), | ||
Certificate::V2(certificate) => certificate.signed_authorities(committee), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature of signed_authorities()
look different then the signature here, so I'm not sure how it is working right now. But I believe you don't need to implement signed_authorities()
in impl Certificate
with a match statement, as long as the method is implement properly for V1 and V2. The match statement will be generated by the #[enum_dispatch(CertificateAPI)]
. We can follow up in another PR.
narwhal/types/src/primary.rs
Outdated
} | ||
|
||
/// Verifies the validity of the certificate. | ||
/// TODO: Output a different type, similar to Sui VerifiedCertificate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it is very hard to migrate to returning a different Certificate
type from sanitize()
and verify()
, but we can use a similar semantics for these functions. Usually functions named sanitize()
and verify()
should not mutate the underlying data.
... Includes new AggregateSignatureState
3800a58
to
d43802f
Compare
## Description [PR#13777](#13777) introduces new `SignatureVerificationState` that can now be used in certificate fetching to only verify the tip of certificate chains and ensure that the verification state is reflected in storage. Doing so should save us time in signature verification for a node that is trying to catchup. ## Test Plan Added unit tests. Catchup tests in private-testnet. ## Results #### NW Catchup Rate @ [200 TPS w/ 2 hours of downtime](https://metrics.sui.io/d/ORCQSHfVk/nw-catchup-dashboard?var-Environment=8Xt1pVoVk&var-network=private-testnet&var-validator=ams-ptn-val-00&var-validator=ams-ptn-val-09&orgId=1&from=1696385905867&to=1696388154869) ![Screenshot 2023-10-10 at 4 25 53 PM](https://github.com/MystenLabs/sui/assets/97870774/ae71c43d-ec24-4d56-85f5-0703c9794ef8) #### NW Catchup Rate @ [5K TPS w/ 1 hour of downtime](https://metrics.sui.io/d/ORCQSHfVk/nw-catchup-dashboard?var-Environment=8Xt1pVoVk&var-network=private-testnet&var-validator=ams-ptn-val-00&var-validator=ams-ptn-val-02&var-validator=ams-ptn-val-09&var-validator=del-ptn-val-08&var-validator=sjc-ptn2-val-00&var-validator=ams-ptn-val-03&orgId=1&from=1696643114770&to=1696651260983) ![Screenshot 2023-10-10 at 4 26 24 PM](https://github.com/MystenLabs/sui/assets/97870774/1e5531eb-9152-4334-a715-a1d6136dd804) ## Known Issues to be investigated/fixed in follow up PRs #### Narwhal catchup only hits full potential after state sync completes ![Screenshot 2023-10-10 at 4 31 21 PM](https://github.com/MystenLabs/sui/assets/97870774/23f25ffe-0998-40c1-af14-ab6e864b8b8d) #### Execution bottleneck at high TPS preventing higher NW catchup rate ![Screenshot 2023-10-10 at 4 33 39 PM](https://github.com/MystenLabs/sui/assets/97870774/aa222598-4570-42a0-85ad-c2f46c77aba2) --- If your changes are not user-facing and not a breaking change, you can skip the following section. Otherwise, please indicate what changed, and then add to the Release Notes section as highlighted during the release process. ### Type of Change (Check all that apply) - [X] protocol change - [ ] user-visible impact - [ ] breaking change for a client SDKs - [ ] breaking change for FNs (FN binary must upgrade) - [ ] breaking change for validators or node operators (must upgrade binaries) - [ ] breaking change for on-chain data layout - [ ] necessitate either a data wipe or data migration ### Release notes Protocol upgrade to version 28 which will enable the use of CertificateV2 in narwhal that will be used to speed up processing of certificates during certificate fetching/catchup.
Added new version for NW
Certificate
which Includes newAggregateSignatureState
. This holdsAggregateSignatureBytes
but with the added layer to specify if the signature was verified via a leader, verified directly, unverified or unsigned. This will be used to take advantage of the certificate chain that is formed via the DAG by only verifying the leaders of the certificate chain when they are fetched from validators during catchup.This is gated by a protocol feature flag so this PR should have no effect yet. Still testing the followup PR which will include the usage of
CertificateV2
, but sharing here to provide more context for these changes.