Distributed prover #1

Champii · 2024-07-30T14:38:37Z

Introduces the Distributed Prover

How to run

The Orchestrator

Create a file distributed.json containing a JSON array of the workers' IP:PORT like this

[
    "1.2.3.4:8081",
    "5.6.7.8:8081"
]

NOTE: The orchestrator can also be a worker in its own pool

Then run sp1 with these ENV variables

# This is used to minimise the RAM usage. Increase this to reduce the proving time
# Note: you don't need to set it on the workers, it is propagated along the request
export SHARD_BATCH_SIZE=1
export SP1_PROVER=distributed

The Workers

Run the worker TCP server

let listen_addr = "0.0.0.0:8081";
let orchestrator_addr="10.200.0.15"
sdk::serve_worker(listen_addr, orchestrator_addr).await?;

The orchestrator_addr points to the orchestrator address, it is a filter to only accept incoming connection from this host.

core/src/utils/prove_distributed/checkpoints.rs

core/src/utils/prove_distributed/mod.rs

sdk/src/distributed/worker/pool.rs

core/src/utils/prove_distributed/checkpoints.rs

mratsim

Continuing review later today.

mratsim · 2024-08-01T09:12:35Z

Cargo.toml

+p3-challenger = { git = "https://github.com/Champii/Plonky3.git", branch = "serde_patch" }
+p3-poseidon2 = { git = "https://github.com/Champii/Plonky3.git", branch = "serde_patch" }
+p3-baby-bear = { git = "https://github.com/Champii/Plonky3.git", branch = "serde_patch" }
+p3-symmetric = { git = "https://github.com/Champii/Plonky3.git", branch = "serde_patch" }


Same remark as taikoxyz/raiko#302 (comment)

this should be added to the taikoxyz org, or maybe a taikoxyz-patches special org to avoid pollution if across Raiko and Gwyneth we expect lots of long-term patches. cc @Brechtpd

a related question is what are the updates in plonky3??

mratsim · 2024-08-01T09:14:16Z

core/src/utils/prove_distributed/checkpoints.rs

+    runtime.subproof_verifier = Arc::new(NoOpSubproofVerifier);
+
+    let (events, _) =
+        tracing::debug_span!("runtime.trace").in_scope(|| runtime.execute_record().unwrap());


I'm unfamiliar with Rust idioms re logging, will this report the file/line it is in? (core/src/utils/prove_distributed/checkpoint.rs:23)

No it won't, it is really just a classic log::debug!(...) macro but with scoped names.

mratsim · 2024-08-01T09:21:17Z

core/src/utils/prove_distributed/checkpoints.rs

+    opts: SP1CoreOpts,
+    records_tx: SyncSender<Vec<ExecutionRecord>>,
+    deferred: &mut ExecutionRecord,
+) {


This probably needs a comment or a link to a markdown document that explains the shared mutable state.

From the code, you have shards that are logical and mapped to execution_shards.
It seems like the distribution happens ahead of time but then after records are processed this execution shard value is incremented.

Given the potential complexity, I would cleanly have only execution functions and only orchestrating functions. And execution functions should only return a status code. I fear mixing processing and state updates will lead to maintenance burden down the line.

This can be refactored for a follow-up PR.

I agree about needing some more documentation.

This code is an adaptation of the classical sp1's proving process taken from https://github.com/taikoxyz/sp1/blob/main/core/src/utils/prove.rs#L234

While I changed the code organization a bit to better fit our needs, I think we shouldn't tamper too much with the substance of the original code or face a possible complete re implementation of the solution whenever they make some breaking changes upstream.

This whole deferred mechanism is a new system that avoids putting too many heavy public values on each shard. It does improve the performances a bit, but implies a new layer of synchronization and value-sharing among the workers and the orchestrator. If we decide to go down the road of implementing a custom solution we could get rid of that mechanism and simplify the execution path drastically. But this has a greater maintenance cost.

mratsim · 2024-08-01T09:21:48Z

core/src/utils/prove_distributed/checkpoints.rs

+            // Update the public values & prover state for the shards which contain "cpu events".
+            for record in records.iter_mut() {
+                state.shard += 1;
+                state.execution_shard = record.public_values.execution_shard;


See here: seems like the caller already orchestrated work distribution.

mratsim · 2024-08-01T09:22:31Z

core/src/utils/prove_distributed/checkpoints.rs

+
+            // Update the public values & prover state for the shards which do not contain "cpu events"
+            // before committing to them.
+            state.execution_shard += 1;


but then the worker manages some execution variable itself?

mratsim · 2024-08-01T09:23:20Z

core/src/utils/prove_distributed/checkpoints.rs

+            state.last_finalize_addr_bits = record.public_values.last_finalize_addr_bits;
+            state.start_pc = state.next_pc;
+            record.public_values = state;
+        }


Is the state a local worker state? If yes it probably should be renamed for clarity.

mratsim · 2024-08-01T09:26:53Z

core/src/utils/prove_distributed/mod.rs

+    );
+
+    let nb_checkpoints_per_workers =
+        (checkpoints_states.len() as f64 / nb_workers as f64).ceil() as usize;


This can lead to work imbalance

For example dividing 40 items on 12 workers will lead to
a base_chunk_size of 40/12 = 3 so work on the first 11 workers
will be 3 * 11 = 33, and the remainder 7 on the last worker.

Instead of dividing 40 work items on 12 cores into:
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 7 = 3*11 + 7 = 40
the best scheme will divide into
4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3 = 4*4 + 3*8 = 40

See https://github.com/mratsim/constantine/blob/master/constantine/threadpool/partitioners.nim#L26-L77

In Rust: https://github.com/privacy-scaling-explorations/halo2/blob/bc857a7/halo2_backend/src/arithmetic.rs#L123-L173, see PR privacy-scaling-explorations/halo2#186

Good catch ! I will study those formulas, but is it worth it to implement such a mechanism ?

Here i put a ceil() call that maximize the number of shards on each workers. While it is clearly not ideal, only the last(s) will have a reduced load, which is not a problem since equal load means equal processing time then all the workers will roughly finish at about the same time, and so it won't add any significant time to the proving process.

And as a bonus, if a worker is left work-less, it only reduces the network traffic needed to communicate with it as well as giving us a free quick recovery worker if any other would fail.

ceil leads to different imbalance issue from Mamy mentioned, but yes, it does not impact the processing time as the busiest work node determines the whole time cost.
I just thought another case, let say 15 tasks to 7machine, the distribution between perfect balance & ceiling workload is: (2 2 2 2 2 2 3) vs (3 3 3 3 3 0 0), seems it match a autoscaling model. as the 1st & 2nd have the same processing time, but obviously the 2nd one has less cost (as it uses only 5 machine).

mratsim

Overall LGTM, some nits about potential load imbalance and future refactoring.

One thing I'm not too sure about are metrics. It's missing but given that we want to stay close to upstream to ease maintenance burden, it might be OK.

mratsim · 2024-08-01T10:26:07Z

core/src/utils/prove_distributed/mod.rs

+    );
+
+    let nb_checkpoints_per_workers =
+        (checkpoints_states.len() as f64 / nb_workers as f64).ceil() as usize;


In Rust: https://github.com/privacy-scaling-explorations/halo2/blob/bc857a7/halo2_backend/src/arithmetic.rs#L123-L173, see PR privacy-scaling-explorations/halo2#186

mratsim · 2024-08-01T10:28:58Z

core/src/utils/prove_distributed/mod.rs

+
+    std::thread::scope(move |s| {
+        let (records_tx, shard_proofs_handle) =
+            threads::spawn_prove(prover, s, opts, scope_span.clone(), challenger.clone(), pk);


Does that spawn multiple threads for the local worker or a single one?

It only spawns one. I agree that the module name could be renamed to something more explicit.

mratsim · 2024-08-01T12:06:55Z

sdk/src/distributed/worker/pool.rs

+                    (i * nb_checkpoints_per_worker * opts.shard_batch_size) as u32;
+
+                WorkerRequest::Commit(RequestData {
+                    elf: elf.to_vec(),


do we really need to own the buffer here?

This is just a matter of simplicity, we may not want to deal with lifetimes in network de/serialized data structures. Considering the kind of RAM usage we are dealing with later in the process, I deemed this small temporary overhead acceptable. What do you think ?

mratsim · 2024-08-01T12:09:43Z

sdk/src/distributed/worker/pool.rs

+    }
+
+    async fn spawn_workers() -> Result<BTreeMap<usize, Arc<RwLock<WorkerSocket>>>, WorkerError> {
+        let ip_list_string = std::fs::read_to_string("distributed.json")


Should this be a config file with a default to distributed.json?

We probably want to offer people to have:
distributed-mainnet.json
distributed-devnet.json
distributed-myboosterrollup1.json
to manage multiple infra from the same repo

This can be a future refactor.

This is indeed a good idea. We should also offer to change the path of such files via some config

mratsim · 2024-08-01T12:10:27Z

sdk/src/distributed/worker/pool.rs

+
+    async fn spawn_workers() -> Result<BTreeMap<usize, Arc<RwLock<WorkerSocket>>>, WorkerError> {
+        let ip_list_string = std::fs::read_to_string("distributed.json")
+            .expect("Sp1 Distributed: Need a `distributed.json` file with a list of IP:PORT");


Does this handle IPv6 btw?

I didn't try, but I don't see anything that would prevent it. Will double check tho

smtmfft

just have a quick look, will refer to sp1's impl for further understanding.

smtmfft · 2024-08-01T15:12:10Z

Cargo.toml

+p3-challenger = { git = "https://github.com/Champii/Plonky3.git", branch = "serde_patch" }
+p3-poseidon2 = { git = "https://github.com/Champii/Plonky3.git", branch = "serde_patch" }
+p3-baby-bear = { git = "https://github.com/Champii/Plonky3.git", branch = "serde_patch" }
+p3-symmetric = { git = "https://github.com/Champii/Plonky3.git", branch = "serde_patch" }


a related question is what are the updates in plonky3??

smtmfft · 2024-08-02T05:03:32Z

core/src/utils/prove_distributed/checkpoints.rs

+    records_tx: SyncSender<Vec<ExecutionRecord>>,
+    deferred: &mut ExecutionRecord,
+) {
+    tracing::debug_span!("phase 1 record generator").in_scope(|| {


I saw phase 2 is gone. Do you know the reason why they have 2 phases??

smtmfft · 2024-08-02T09:24:16Z

core/src/utils/prove_distributed/checkpoints.rs

+            }
+
+            // See if any deferred shards are ready to be commited to.
+            let mut _deferred = deferred.split(false, opts.split_opts);


if all deferred are processed after last checkpoint, do we still need split call here?

smtmfft · 2024-08-02T10:22:39Z

core/src/utils/prove_distributed/mod.rs

+    );
+
+    let nb_checkpoints_per_workers =
+        (checkpoints_states.len() as f64 / nb_workers as f64).ceil() as usize;


ceil leads to different imbalance issue from Mamy mentioned, but yes, it does not impact the processing time as the busiest work node determines the whole time cost.
I just thought another case, let say 15 tasks to 7machine, the distribution between perfect balance & ceiling workload is: (2 2 2 2 2 2 3) vs (3 3 3 3 3 0 0), seems it match a autoscaling model. as the 1st & 2nd have the same processing time, but obviously the 2nd one has less cost (as it uses only 5 machine).

Champii marked this pull request as ready for review July 30, 2024 14:48

Champii requested review from smtmfft and petarvujovic98 July 30, 2024 14:48

Champii commented Jul 31, 2024

View reviewed changes

Champii force-pushed the sp1_distributed branch 2 times, most recently from ebc9793 to 64e3081 Compare July 31, 2024 14:38

mratsim reviewed Aug 1, 2024

View reviewed changes

mratsim approved these changes Aug 1, 2024

View reviewed changes

Champii mentioned this pull request Aug 1, 2024

feat(distributed_sp1): first working version of sp1 distributed prover taikoxyz/raiko#302

Open

7 tasks

smtmfft reviewed Aug 2, 2024

View reviewed changes

Distributed prover

3edc61e

Champii force-pushed the sp1_distributed branch from 64e3081 to 3edc61e Compare August 12, 2024 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed prover #1

Distributed prover #1

Champii commented Jul 30, 2024 •

edited

Loading

mratsim left a comment

mratsim Aug 1, 2024

smtmfft Aug 1, 2024

mratsim Aug 1, 2024

Champii Aug 1, 2024

mratsim Aug 1, 2024

Champii Aug 1, 2024 •

edited

Loading

mratsim Aug 1, 2024

mratsim Aug 1, 2024

mratsim Aug 1, 2024

mratsim Aug 1, 2024 •

edited

Loading

mratsim Aug 1, 2024

Champii Aug 1, 2024 •

edited

Loading

smtmfft Aug 2, 2024

mratsim left a comment

mratsim Aug 1, 2024

mratsim Aug 1, 2024

Champii Aug 1, 2024

mratsim Aug 1, 2024

Champii Aug 1, 2024

mratsim Aug 1, 2024

Champii Aug 1, 2024 •

edited

Loading

mratsim Aug 1, 2024

Champii Aug 1, 2024

smtmfft left a comment

smtmfft Aug 1, 2024

smtmfft Aug 2, 2024

smtmfft Aug 2, 2024

smtmfft Aug 2, 2024

Distributed prover #1

Are you sure you want to change the base?

Distributed prover #1

Conversation

Champii commented Jul 30, 2024 • edited Loading

Introduces the Distributed Prover

How to run

The Orchestrator

The Workers

mratsim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Champii Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mratsim Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Champii Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mratsim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Champii Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smtmfft left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Champii commented Jul 30, 2024 •

edited

Loading

Champii Aug 1, 2024 •

edited

Loading

mratsim Aug 1, 2024 •

edited

Loading

Champii Aug 1, 2024 •

edited

Loading

Champii Aug 1, 2024 •

edited

Loading