-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: SNARK pool batching #4882
RFC: SNARK pool batching #4882
Conversation
## Detailed design | ||
[detailed-design]: #detailed-design | ||
|
||
The proposed mechanism is to have two "batch modes": Batch All and Batch by Prover, and to split provers into two categories: Trusted and Untrusted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think batching snarks that are required immediately (if the block producer is scheduled to create the next block) would also be nice? if we wait too long to batch verify/ or some snarks are in a batch that won't be verified until after the diff is created then we might not be able include as many transactions if they were serialized. But this might not be as beneficial from a performance standpoint.
Also, we have this delay
factor in scan state so, if there are enough provers and the coordinator picks the work that are needed first then the nodes producing blocks should have the required work at least a block before they need it (if the delay is at least 1). So maybe it is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deepthiskumar how can we detect which of these proofs are required in the next block? I think it makes sense to eagerly start a batch before the timeout/max batch size if there's a pending block production deadline, but I'm not sure how to know which proofs to give that treatment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can query the scan state at the best tip to get all the required work for the next block. work_statements_for_new_diff
in transaction_snark_scan_state.ml
will fetch you that
|
||
The timer ensures that, even when SNARK work is coming in slowly, we aren't adding too much latency into the gossip verification. | ||
|
||
In order to identify provers into those two categories, we need an unforgeable notion of prover identity. I propose we add a new signature of the SNARK work, using a key that is present in the best tip ledger. This adds a new failure mode to SNARK proofs: signing with a key that's not present in the ledger. Users should be discouraged from storing value in that account. By necessity, the private key will need to be relatively unprotected as it needs to be available on the SNARK coordinator to sign proofs before broadcasting them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much time does it takes to check a signature? Is that the 10 ms in 600ms + N*10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, good point, this doesn't account for the extra signature verification time. I can add a benchmark for this, and I suspect it'll be slower than it could be (using a different signature scheme) but not as slow as not batching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imeckler gave me new observed numbers: 800ms + N*56ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The identity can be included as part of the signature-of-knowledge message and thus implicitly included in SNARK verification
|
||
An early idea was an extra "not yet known" state before nodes become Untrusted. In the extra state, we don't do any batching, so as to avoid commiting potentially a lot of time to verifying a bogus batch before any reputation is established. This was deemed unnecessary. | ||
|
||
One idea was for identifying provers was to use the fee recipient account, since it's already encoded into the SoK. Without adding an additional signature this allows forgery as only public information is needed, and also if the proof fails to verify then nothing can be inferred from it. Using the fee recipient account itself seems unwise as it will be storing actual value, and having the private key just lying around with on the SNARK coordinator is risky (and a regression from today's world where SNARK coordinators and workers need no sensitive information). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understand the forgery part. The Sok is baked in the snark and so if someone sent it with a different key, the proof would fail to verify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SoK digest is baked into the snark (and I'm not sure there's a way to extract it? maybe it's a public input), and if the proof fails to verify because it was sent with a different key we still don't know who to blame for the bad proof.
- Upon receipt of the first SNARK work into the queue, start a `MAX_SNARK_DELAY` second timer. | ||
- When the queue hits `MAX_SNARK_BATCH_LENGTH`, or after the timer expires, clear the queue and start the batch verification process. | ||
|
||
The timer ensures that, even when SNARK work is coming in slowly, we aren't adding too much latency into the gossip verification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emberian is there a need for a MAX_SNARK_BATCH_LENGTH
? Or even any kind of timer? It occurs to me that we could maybe do the following.
Have a state
type verifier_state =
{ mutable out_for_verification : Snark_work.t list option
; queue : Snark_work.t Queue.t }
out_for_verification : Snark_work.t list option
represents the proofs which are currently being processed by the verifier. It isNone
if the verifier is not verifying any proofs at the moment.- When a new proof comes in put it in the queue. If
out_for_verification = None
, then verify that proof immediately. - Whenever the verifier returns, flush
queue
into the verifier and move the queue's contents intoout_for_verification
. If the verifier returned successfully, then insert the oldout_for_verification
proofs into the table and regossip them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess basically you are suggesting waiting MAX_SNARK_DELAY
(which should be understood as max delay beyond mere verification time) in the beginning to get more proofs into that first verification right?
Signing works produced (ie, proving possession of the private key) prevents attacks where malicious provers can pay computation to broadcast 0-fee proofs for arbitrary pubkeys. |
from izzy: zcash foundation have a similar problem, and end up with a similar solution? find it, compare it, link it. |
Our concern is mostly DOS. One concern is it easy to circumvent batching by prover and accomplish the same DOS by never using the same prover public key twice right? |
@imeckler Because this adds a new requirement that the snark worker public key has to be in the ledger, doing such an attack would involve paying the account creation fee for each attack. This should deter this kind of attack so long as we are able to reduce the cost of handling a bad proof from a prover, which we should be able to do correctly by tuning the amount of "good work" required to move over into the "trusted prover" state. |
Preview: Built with commit 94b42f2 |
There are some grammatical tense issues in this, but I have decided not to fix them. This one still has some unresolved questions and it introduces a new operational requirement that I find distasteful. Feedback wanted!