Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check signature verif performance #3071

Closed
damip opened this issue Sep 23, 2022 · 8 comments
Closed

Check signature verif performance #3071

damip opened this issue Sep 23, 2022 · 8 comments
Milestone

Comments

@damip
Copy link
Member

damip commented Sep 23, 2022

Bench with ed25519_dalek: 10 000 000 verifs in 377359 milliseconds. That's 190 milliseconds to verify a full block (5000 + 1 + 9 signatures)

TODO: are there faster crates implementing the same algo ?

TODO: is there a way to parallelize verif ?

TODO: use ed25519_dalek::verify_batch ?

(eg. is https://docs.rs/ring-compat/latest/ring_compat/ a good alternative ? or https://github.com/jedisct1/rust-ed25519-compact ?)

Also, we shouldn't be using the crypto implem directly, but through https://docs.rs/ed25519/latest/ed25519/ which allows supporting HSM and other things

@damip damip added this to the TEST.15.X milestone Sep 23, 2022
@gterzian
Copy link
Contributor

gterzian commented Sep 26, 2022

Operation verification occurs there:

operation.verify_signature()?;

I propose we simply create a vector of operations to verify(those not found in checked_operations), and do the verification using a parallel iterator from rayon.

ed25519_dalek::verify_batch seems to come with all kinds of warnings on how to use it, so I propose we look into that only later if necessary.

@damip
Copy link
Member Author

damip commented Sep 26, 2022

Operation verification occurs there:

operation.verify_signature()?;

I propose we simply create a vector of operations to verify(those not found in checked_operations), and do the verification using a parallel iterator from rayon.

That's cool

ed25519_dalek::verify_batch seems to come with all kinds of warnings on how to use it, so I propose we look into that only later if necessary.

Those warnings are the same as the ones on the nornal "verify" vs "verify_strict" so no extra danger here, and using it is probably a good idea for batches in rayon

@damip
Copy link
Member Author

damip commented Sep 26, 2022

The underlying problem we have is that we don't batch operation announcement / asking properly (we do it only when asking ops for blocks). And using rayon on just 1 verif will actually decrease performance

@gterzian
Copy link
Contributor

gterzian commented Sep 27, 2022

we don't batch operation announcement / asking properly

How about we switch to periodically sending announcements(as opposed to immediately), and the current call to propagate_ops would only add them to a local structure, that would be drained when the next time comes to announce ops?

@damip

@damip
Copy link
Member Author

damip commented Sep 27, 2022

we don't batch operation announcement / asking properly

How about we switch to periodically sending announcements(as opposed to immediately), and the current call to propagate_ops would only add them to a local structure, that would be drained when the next time comes to announce ops?

@damip

It's a path to explore, but It should be studied a bit because it would have an impact on propagation latency

@sebastien-forestier
Copy link

sebastien-forestier commented Sep 27, 2022

Is sig verification actually slower compared to say testnet 11 ? Or similar or faster ?

Parallelization will not reduce cpu usage, maybe even increase it a bit, but it will reduce latency, which I don't think is a problem here.

@gterzian
Copy link
Contributor

It's a path to explore, but It should be studied a bit because it would have an impact on propagation latency

@damip trying it out in #3086

@damip
Copy link
Member Author

damip commented Sep 30, 2022

Some benchmarks of ed25519_dalek signature verification:

  • X = batch size
  • Y = nanoseconds per verification
  • blue = single-core, non-batched verif
  • red = single-core, batch_verif
  • yellow = rayon (20 logical / 10 physical cores), non-batched verif

Screenshot from 2022-09-30 13-51-00

Limit at batch_size = 5000:

  • single_core, non-batched => 38503 ns/verif
  • single_core, batch_verif => 12182 ns/verif
  • rayon, non-batched verif => 4804 ns/verif

It therefore boils down to batching and using batch_verif, and doing so in rayon to process each batch in parallel.
Choosing batch_size is done according to what is more important:

  • A - total processing time of operations (eg. the speed at which the Protocol loop iterates)
  • B - the total CPU consumption unit of time dedicated to operation processing

The more emphasis we put on A, the smaller the batch size should be. The more emphasis we put on B, the larger the batch size should be.

=> Optimized implementation done here: #3095

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants