-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed prover #1
base: taiko
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
use std::sync::{mpsc::SyncSender, Arc}; | ||
|
||
pub use crate::{air::PublicValues, runtime::Program, stark::RiscvAir}; | ||
|
||
use crate::{ | ||
runtime::{ExecutionRecord, NoOpSubproofVerifier, Runtime}, | ||
stark::{MachineProver, MachineRecord}, | ||
utils::{baby_bear_poseidon2::Val, BabyBearPoseidon2, SP1CoreOpts}, | ||
}; | ||
|
||
use super::Checkpoint; | ||
|
||
fn trace_checkpoint( | ||
program: Program, | ||
checkpoint: Checkpoint, | ||
opts: SP1CoreOpts, | ||
) -> (Vec<ExecutionRecord>, Checkpoint) { | ||
let mut runtime = Runtime::recover(program, checkpoint, opts); | ||
|
||
runtime.subproof_verifier = Arc::new(NoOpSubproofVerifier); | ||
|
||
let (events, _) = | ||
tracing::debug_span!("runtime.trace").in_scope(|| runtime.execute_record().unwrap()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm unfamiliar with Rust idioms re logging, will this report the file/line it is in? (core/src/utils/prove_distributed/checkpoint.rs:23) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No it won't, it is really just a classic |
||
|
||
let state = runtime.state.clone(); | ||
|
||
(events, state) | ||
} | ||
|
||
pub fn process<P: MachineProver<BabyBearPoseidon2, RiscvAir<Val>>>( | ||
prover: &P, | ||
program: &Program, | ||
checkpoint: Checkpoint, | ||
nb_checkpoints: usize, | ||
state: PublicValues<u32, u32>, | ||
opts: SP1CoreOpts, | ||
records_tx: SyncSender<Vec<ExecutionRecord>>, | ||
deferred: &mut ExecutionRecord, | ||
is_deferred: bool, | ||
) { | ||
if is_deferred { | ||
process_deferred(program, checkpoint, state, opts, records_tx, deferred); | ||
} else { | ||
process_regular( | ||
prover, | ||
program, | ||
checkpoint, | ||
nb_checkpoints, | ||
state, | ||
opts, | ||
records_tx, | ||
deferred, | ||
); | ||
} | ||
} | ||
|
||
fn process_regular<P: MachineProver<BabyBearPoseidon2, RiscvAir<Val>>>( | ||
prover: &P, | ||
program: &Program, | ||
mut checkpoint: Checkpoint, | ||
nb_checkpoints: usize, | ||
mut state: PublicValues<u32, u32>, | ||
opts: SP1CoreOpts, | ||
records_tx: SyncSender<Vec<ExecutionRecord>>, | ||
deferred: &mut ExecutionRecord, | ||
) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This probably needs a comment or a link to a markdown document that explains the shared mutable state. From the code, you have shards that are logical and mapped to execution_shards. Given the potential complexity, I would cleanly have only execution functions and only orchestrating functions. And execution functions should only return a status code. I fear mixing processing and state updates will lead to maintenance burden down the line. This can be refactored for a follow-up PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree about needing some more documentation. This code is an adaptation of the classical sp1's proving process taken from https://github.com/taikoxyz/sp1/blob/main/core/src/utils/prove.rs#L234 While I changed the code organization a bit to better fit our needs, I think we shouldn't tamper too much with the substance of the original code or face a possible complete re implementation of the solution whenever they make some breaking changes upstream. This whole |
||
tracing::debug_span!("phase 1 record generator").in_scope(|| { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I saw phase 2 is gone. Do you know the reason why they have 2 phases?? |
||
let mut processed_checkpoints = 0; | ||
|
||
while processed_checkpoints < nb_checkpoints { | ||
log::info!( | ||
"Processing checkpoint {}/{}", | ||
processed_checkpoints + 1, | ||
nb_checkpoints | ||
); | ||
// Trace the checkpoint and reconstruct the execution records. | ||
let (mut records, new_checkpoint) = tracing::debug_span!("trace checkpoint") | ||
.in_scope(|| trace_checkpoint(program.clone(), checkpoint, opts)); | ||
|
||
checkpoint = new_checkpoint; | ||
|
||
// Update the public values & prover state for the shards which contain "cpu events". | ||
for record in records.iter_mut() { | ||
state.shard += 1; | ||
state.execution_shard = record.public_values.execution_shard; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See here: seems like the caller already orchestrated work distribution. |
||
state.start_pc = record.public_values.start_pc; | ||
state.next_pc = record.public_values.next_pc; | ||
record.public_values = state; | ||
} | ||
|
||
// Generate the dependencies. | ||
tracing::debug_span!("generate dependencies") | ||
.in_scope(|| prover.machine().generate_dependencies(&mut records, &opts)); | ||
|
||
// Defer events that are too expensive to include in every shard. | ||
for record in records.iter_mut() { | ||
deferred.append(&mut record.defer()); | ||
} | ||
|
||
// See if any deferred shards are ready to be commited to. | ||
let mut _deferred = deferred.split(false, opts.split_opts); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if all deferred are processed after last checkpoint, do we still need split call here? |
||
|
||
// Update the public values & prover state for the shards which do not contain "cpu events" | ||
// before committing to them. | ||
state.execution_shard += 1; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but then the worker manages some execution variable itself? |
||
|
||
records_tx.send(records).unwrap(); | ||
|
||
processed_checkpoints += 1; | ||
} | ||
}); | ||
} | ||
|
||
fn process_deferred( | ||
program: &Program, | ||
checkpoint: Checkpoint, | ||
mut state: PublicValues<u32, u32>, | ||
opts: SP1CoreOpts, | ||
records_tx: SyncSender<Vec<ExecutionRecord>>, | ||
deferred: &mut ExecutionRecord, | ||
) { | ||
tracing::debug_span!("phase 1 record generator").in_scope(|| { | ||
// Trace the checkpoint and reconstruct the execution records. | ||
let (mut records, _) = tracing::debug_span!("trace checkpoint") | ||
.in_scope(|| trace_checkpoint(program.clone(), checkpoint, opts)); | ||
|
||
// Update the public values & prover state for the shards which contain "cpu events". | ||
for record in records.iter_mut() { | ||
// state.shard += 1; | ||
Champii marked this conversation as resolved.
Show resolved
Hide resolved
|
||
state.execution_shard = record.public_values.execution_shard; | ||
state.start_pc = record.public_values.start_pc; | ||
state.next_pc = record.public_values.next_pc; | ||
record.public_values = state; | ||
} | ||
|
||
// See if any deferred shards are ready to be commited to. | ||
let mut deferred = deferred.split(true, opts.split_opts); | ||
|
||
// Update the public values & prover state for the shards which do not contain "cpu events" | ||
// before committing to them. | ||
|
||
for record in deferred.iter_mut() { | ||
state.shard += 1; | ||
state.previous_init_addr_bits = record.public_values.previous_init_addr_bits; | ||
state.last_init_addr_bits = record.public_values.last_init_addr_bits; | ||
state.previous_finalize_addr_bits = record.public_values.previous_finalize_addr_bits; | ||
state.last_finalize_addr_bits = record.public_values.last_finalize_addr_bits; | ||
state.start_pc = state.next_pc; | ||
record.public_values = state; | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the state a local worker state? If yes it probably should be renamed for clarity. |
||
|
||
records_tx.send(deferred).unwrap(); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark as taikoxyz/raiko#302 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a related question is what are the updates in plonky3??