-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(prover): Parallelize circuit metadata uploading for BWG #2520
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Maybe we should add a limit on the number of simultaneous uploads? And throttle the harness until there is space in that "sending queue". |
popzxc
reviewed
Jul 29, 2024
Basic Witness Generation (BWG) creates circuits for base layer to be proven. BWG runs in-circuit VM and provides circuits for the instructions in the execution (note that those are ordered, messing with the order would break proving). Additionally, harness will do heavy computation -> provide circuits -> heavy computation -> provide circuits. As such, providing circuits is rather fast. Once a circuit is produced, it's sent to BWG, which in turn sends it to GCS and saves it to DB. The queue between BWG runner and harness is of size 1. It was picked in this way to ensure that harness won't overwhelm bwg runner with circuits (if the queue was big, the amount of RAM used would be size of queue * size of circuit). The problem is that the harness will be throttled on circuit submission. It needs to blockingly wait for bwg runner to upload data (GCS + PG). There are 2 alternatives here: - make the buffer bigger -- this would work, but it would start consuming more memory. In theory, if you can match the size of the buffer to fit in between CPU cycles, harness wouldn't need to wait for upload and bwg runner would catch up by the time harness has a new set of circuits. The problem with this approach is that it relies on deterministic upload time (nothing can be deterministic in distributed systems) and fixed circuit size (this is not true, because batches can have varying number of circuits, of varying types). - upload everything async -- this work wonders, but there will be multiple circuits in flight being uploaded at any time. This will also increase RAM. In my testing, I've seen increases smaller than 100MB (acceptable tradeoff). The edge cases are when GCS is down or not working as intended (for instance, refusing connection). In such scenarios the async tasks will grow, adding to a lot more RAM usage (up to 50GB). Whilst this is a real problem, I consider it more of an edge case. Furthermore, there are a few ways to make GCS more reliable. The second option was picked here. MPSC channel (the buffer) is still of size 1 to have a single point of failure when RAM goes up (we don't want to figure out -- do we have too many things in the buffer, or do we have problems with GCS? -- in the current implementation, it will always be GCS). Furthermore, MPSC(1) doesn't cause much overhead.
EmilLuta
force-pushed
the
evl-async-bwg-uploading
branch
from
July 29, 2024 14:40
ba30bd7
to
3023447
Compare
0xVolosnikov
previously approved these changes
Jul 29, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
popzxc
reviewed
Jul 29, 2024
popzxc
previously approved these changes
Jul 30, 2024
popzxc
previously approved these changes
Jul 30, 2024
popzxc
previously approved these changes
Jul 30, 2024
popzxc
approved these changes
Jul 30, 2024
This was referenced Jul 30, 2024
github-merge-queue bot
pushed a commit
that referenced
this pull request
Aug 1, 2024
🤖 I have created a release *beep* *boop* --- ## [24.13.0](core-v24.12.0...core-v24.13.0) (2024-07-31) ### Features * Add recovery tests to zk_supervisor ([#2444](#2444)) ([0c0d10a](0c0d10a)) * Added a JSON RPC to simulating L1 for consensus attestation ([#2480](#2480)) ([c6b3adf](c6b3adf)) * added dropping all attester certificates when doing hard fork ([#2529](#2529)) ([5acd686](5acd686)) * **configs:** Do not panic if config is only partially filled ([#2545](#2545)) ([db13fe3](db13fe3)) * **eth-sender:** Make eth-sender tests use blob txs + refactor of eth-sender tests ([#2316](#2316)) ([c8c8334](c8c8334)) * Introduce more tracing instrumentation ([#2523](#2523)) ([79d407a](79d407a)) * Remove unused VKs, add docs for BWG ([#2468](#2468)) ([2fa6bf0](2fa6bf0)) * Revisit base config values ([#2532](#2532)) ([3fac8ac](3fac8ac)) * Server 10k gwei limit on gas price and 1M limit on pubdata price ([#2460](#2460)) ([be238cc](be238cc)) * **vlog:** Implement otlp guard with force flush on drop ([#2536](#2536)) ([c9f76e5](c9f76e5)) * **vlog:** New vlog interface + opentelemtry improvements ([#2472](#2472)) ([c0815cd](c0815cd)) * **zk_toolbox:** add test upgrade subcommand to zk_toolbox ([#2515](#2515)) ([1a12f5f](1a12f5f)) * **zk_toolbox:** use configs from the main repo ([#2470](#2470)) ([4222d13](4222d13)) ### Bug Fixes * **contract verifier:** Fix config values ([#2510](#2510)) ([3729468](3729468)) * fixed panic propagation ([#2525](#2525)) ([e0fc58b](e0fc58b)) * **proof_data_handler:** Unlock jobs on transient errors ([#2486](#2486)) ([7c336b1](7c336b1)) * **prover:** Parallelize circuit metadata uploading for BWG ([#2520](#2520)) ([f49720f](f49720f)) * VM performance diff: don't show 0 as N/A ([#2276](#2276)) ([2fa2249](2fa2249)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: zksync-era-bot <[email protected]>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Aug 2, 2024
🤖 I have created a release *beep* *boop* --- ## [16.2.0](prover-v16.1.0...prover-v16.2.0) (2024-08-02) ### Features * **configs:** Do not panic if config is only partially filled ([#2545](#2545)) ([db13fe3](db13fe3)) * Introduce more tracing instrumentation ([#2523](#2523)) ([79d407a](79d407a)) * New prover documentation ([#2466](#2466)) ([1b61d07](1b61d07)) * optimize LWG and NWG ([#2512](#2512)) ([0d00650](0d00650)) * Poll the main node for the next batch to sign (BFT-496) ([#2544](#2544)) ([22cf820](22cf820)) * Remove CI and docker images for CPU prover & compressor ([#2540](#2540)) ([0dda805](0dda805)) * Remove unused VKs, add docs for BWG ([#2468](#2468)) ([2fa6bf0](2fa6bf0)) * Support sending logs via OTLP ([#2556](#2556)) ([1d206c0](1d206c0)) * Update to consensus 0.1.0-rc.4 (BFT-486) ([#2475](#2475)) ([ff6b10c](ff6b10c)) * **vlog:** New vlog interface + opentelemtry improvements ([#2472](#2472)) ([c0815cd](c0815cd)) * **witness_vector_generator:** Make it possible to run multiple wvg instances in one binary ([#2493](#2493)) ([572ad40](572ad40)) * **zk_toolbox:** use configs from the main repo ([#2470](#2470)) ([4222d13](4222d13)) ### Bug Fixes * **prover:** Parallelize circuit metadata uploading for BWG ([#2520](#2520)) ([f49720f](f49720f)) * **prover:** Reduce database connection overhead for BWG ([#2543](#2543)) ([c2439e9](c2439e9)) * **witness_generator:** Only spawn 1 prometheus exporter per witness generator ([#2492](#2492)) ([b9cb222](b9cb222)) ### Reverts * "feat: Poll the main node for the next batch to sign (BFT-496)" ([#2574](#2574)) ([72d3be8](72d3be8)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: Danil <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Basic Witness Generation (BWG) creates circuits for base layer to be proven. BWG runs out-of-circuit VM and provides circuits for the instructions in the execution (note that those are ordered, messing with the order would break proving).
Additionally, harness will do heavy computation -> provide circuits -> heavy computation -> provide circuits. As such, providing circuits is rather fast. Once a circuit is produced, it's sent to BWG, which in turn sends it to GCS and saves it to DB. The queue between BWG runner and harness is of size 1. It was picked in this way to ensure that harness won't overwhelm bwg runner with circuits (if the queue was big, the amount of RAM used would be size of queue * size of circuit). The problem is that the harness will be throttled on circuit submission. It needs to blockingly wait for bwg runner to upload data (GCS + PG).
There are 2 alternatives here:
The second option was picked here. MPSC channel (the buffer) is still of size 1 to have a single point of failure when RAM goes up (we don't want to figure out -- do we have too many things in the buffer, or do we have problems with GCS? -- in the current implementation, it will always be GCS). Furthermore, MPSC(1) doesn't cause much overhead.
The implementation has been tested locally, using mainnet and testnet batches. Whilst the number are super encouraging (00:03:58 time, with ~40GB RAM), the CPUs (and their speed) is slightly lower in GCP.