Skip to content

Commit

Permalink
fix: Use throw instead of reject in broker facade (#10735)
Browse files Browse the repository at this point in the history
We were seeing proving failures for the initial epoch in rc1. In Loki,
this looked like the proof was ongoing, and suddently the node
restarted. See the "Logger initialized" line at 11:08:54.


![image](https://github.com/user-attachments/assets/88fdcc8b-b406-49de-80da-7837664c4423)

Looking for the same time range in google cloud, which fetches loads
from stderr instead of the OTEL collector, shows an error that was not
in Loki, and that was not json-formatted, meaning it didn't come from
pino:


![image](https://github.com/user-attachments/assets/64725f02-22e7-4be3-91f0-e451e7c59564)

This an error emitted by the node process directly. I was able to
reproduce this in a local setup by spawning the prover node, agent, and
broker as separate process (I could not reproduce it if node and broker
were on the _same_ process), and causing the bb-prover to fail on AVM
proofs. The errors in all caps are temporary log lines I added for
testing:

```
[18:44:22.950] VERBOSE: prover-client:orchestrator Updating archive tree with block 5 header 0x1e2aefd32ac7a3f31826e996e637ca6c8a053bb69c9087318f63c65311fd804a
[18:44:22.954] VERBOSE: prover-client:orchestrator Orchestrator finalised block 5
[18:44:23.194] ERROR: prover-client:caching-prover-broker PROVING JOB FAILED id=PUBLIC_VM:7e3848db3c57254c1d8875839cdaa9d330c517d0bba84a3c805bc3d03962d3f9: FAILING AVM
file:///home/santiago/Projects/aztec3-packages-3/yarn-project/prover-client/dest/proving_broker/caching_broker_facade.js:104
                return Promise.reject(new Error(result.reason));
                                      ^

Error: FAILING AVM
    at CachingBrokerFacade.enqueueAndWaitForJob (file:///home/santiago/Projects/aztec3-packages-3/yarn-project/prover-client/dest/proving_broker/caching_broker_facade.js:104:39)
    at async file:///home/santiago/Projects/aztec3-packages-3/yarn-project/prover-client/dest/orchestrator/orchestrator.js:742:37
    at async file:///home/santiago/Projects/aztec3-packages-3/yarn-project/telemetry-client/dest/telemetry.js:72:25
    at async Immediate.safeJob (file:///home/santiago/Projects/aztec3-packages-3/yarn-project/prover-client/dest/orchestrator/orchestrator.js:407:40)

Node.js v18.19.1
~/Projects/aztec3-packages-3/yarn-project/aztec (master)$ 
```

This killed the prover node process itself in the
`CachingBrokerFacade.enqueueAndWaitForJob`.

Changing the `return Promise.reject(new Error(result.reason));` line in
the facade to a `throw new Error(result.reason);` fixed the issue:

```
[18:52:04.239] VERBOSE: simulator:public-processor Processed tx 0x22f9872451dcd3b5ea4157d3d5b210b48fc19122d97204803bbf74b9e365398b with 1 public calls {"txHash":"0x22f9872451dcd3b5ea4157d3d5b210b48fc19122d97204803bbf74b9e365398b","txFee":25842656343661200,"revertCode":0,"gasUsed":{"totalGas":{"daGas":15360,"l2Gas":476820},"teardownGas":{"daGas":0,"l2Gas":0}},"publicDataWriteCount":8,"nullifierCount":3,"noteHashCount":0,"contractClassLogCount":0,"unencryptedLogCount":0,"privateLogCount":1,"l2ToL1MessageCount":0}
[18:52:04.239] INFO: prover-client:orchestrator Received transaction: 0x22f9872451dcd3b5ea4157d3d5b210b48fc19122d97204803bbf74b9e365398b
[18:52:04.261] ERROR: prover-client:orchestrator GETTING AVM PROOF
[18:52:04.287] VERBOSE: prover-node:epoch-proving-job Processed all 1 txs for block 2 {"blockNumber":2,"blockHash":"0x11c03a9cf30308de9e0c30f5c08db6029db2f9196e86f2b92cc7141d9de33254","uuid":"2aa2de51-ce77-4d0f-a4e6-58e593ec0eca"}
[18:52:04.297] VERBOSE: prover-client:orchestrator Block 2 completed. Assembling header.
[18:52:04.301] VERBOSE: prover-client:orchestrator Updating archive tree with block 2 header 0x11c03a9cf30308de9e0c30f5c08db6029db2f9196e86f2b92cc7141d9de33254
[18:52:04.304] VERBOSE: prover-client:orchestrator Orchestrator finalised block 2
[18:52:05.186] ERROR: prover-client:caching-prover-broker PROVING JOB FAILED id=PUBLIC_VM:9fed284ba66a64c2237419640ed5af6e4f466b17a42b36ffd0039542d7cb8443: FAILING AVM
[18:52:05.188] ERROR: prover-client:orchestrator ERROR IN AVM PROOF
[18:52:05.188] WARN: prover-client:orchestrator Error thrown when proving AVM circuit, but AVM_PROVING_STRICT is off, so faking AVM proof and carrying on. Error: Error: FAILING AVM.
[18:52:06.925] VERBOSE: prover-client:orchestrator Orchestrator completed root rollup for epoch 2
[18:52:06.925] INFO: prover-node:epoch-proving-job Finalised proof for epoch 2 {"epochNumber":2,"uuid":"2aa2de51-ce77-4d0f-a4e6-58e593ec0eca","duration":3014.239808999002}
[18:52:06.932] INFO: sequencer:publisher SubmitEpochProof proofSize=38 bytes
[18:52:06.958] VERBOSE: sequencer:publisher Sent L1 transaction 0x4ee81acf36558c1b0945b6bf75201853a00c0d22123e278c1b3cad303d064c32 {"gasLimit":296230,"maxFeePerGas":"1.500241486","maxPriorityFeePerGas":"1.44"}
[18:52:07.949] INFO: archiver Updated proven chain to block 2 (epoch 2) {"provenBlockNumber":2,"provenEpochNumber":2}
[18:52:07.998] VERBOSE: world_state Finalized chain is now at block 2
[18:52:11.967] INFO: archiver Updated proven chain to block 2 (epoch 2) {"provenBlockNumber":2,"provenEpochNumber":2}
[18:52:15.994] INFO: archiver Updated proven chain to block 2 (epoch 2) {"provenBlockNumber":2,"provenEpochNumber":2}
```

This PR also tweaks a few logs along the way.
  • Loading branch information
spalladino authored Dec 13, 2024
1 parent 934107f commit cc6a72b
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 17 deletions.
7 changes: 5 additions & 2 deletions yarn-project/prover-client/src/orchestrator/orchestrator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,7 @@ export class ProvingOrchestrator implements EpochProver {
}

// And build the block header
logger.verbose(`Block ${provingState.globalVariables.blockNumber} completed. Assembling header.`);
logger.verbose(`Block ${blockNumber} completed. Assembling header.`);
await this.buildBlock(provingState, expectedHeader);

// If the proofs were faster than the block building, then we need to try the block root rollup again here
Expand Down Expand Up @@ -1196,10 +1196,13 @@ export class ProvingOrchestrator implements EpochProver {
return await this.prover.getAvmProof(inputs, signal, provingState.epochNumber);
} catch (err) {
if (process.env.AVM_PROVING_STRICT) {
logger.error(`Error thrown when proving AVM circuit with AVM_PROVING_STRICT on`, err);
throw err;
} else {
logger.warn(
`Error thrown when proving AVM circuit, but AVM_PROVING_STRICT is off, so faking AVM proof and carrying on. Error: ${err}.`,
`Error thrown when proving AVM circuit but AVM_PROVING_STRICT is off. Faking AVM proof and carrying on. ${inspect(
err,
)}.`,
);
return {
proof: makeEmptyRecursiveProof(AVM_PROOF_LENGTH_IN_FIELDS),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ export class CachingBrokerFacade implements ServerCircuitProver {
): Promise<ProvingJobResultsMap[T]> {
// first try the cache
let jobEnqueued = false;
let jobRejected = undefined;
try {
const cachedResult = await this.cache.getProvingJobStatus(id);
if (cachedResult.status !== 'not-found') {
Expand All @@ -78,8 +79,7 @@ export class CachingBrokerFacade implements ServerCircuitProver {
this.log.warn(`Cached result type mismatch for job=${id}. Expected=${type} but got=${output.type}`);
}
} else if (cachedResult.status === 'rejected') {
// prefer returning a rejected promises so that we don't trigger the catch block below
return Promise.reject(new Error(cachedResult.reason));
jobRejected = cachedResult.reason ?? 'Job rejected for unknown reason';
} else if (cachedResult.status === 'in-progress' || cachedResult.status === 'in-queue') {
jobEnqueued = true;
} else {
Expand All @@ -89,6 +89,10 @@ export class CachingBrokerFacade implements ServerCircuitProver {
this.log.warn(`Failed to get cached proving job id=${id}: ${err}. Re-running job`);
}

if (jobRejected) {
throw new Error(jobRejected);
}

if (!jobEnqueued) {
try {
const inputsUri = await this.proofStore.saveProofInput(id, type, inputs);
Expand Down Expand Up @@ -142,10 +146,10 @@ export class CachingBrokerFacade implements ServerCircuitProver {
if (output.type === type) {
return output.result as ProvingJobResultsMap[T];
} else {
return Promise.reject(new Error(`Unexpected proof type: ${output.type}. Expected: ${type}`));
throw new Error(`Unexpected proof type: ${output.type}. Expected: ${type}`);
}
} else {
return Promise.reject(new Error(result.reason));
throw new Error(result.reason);
}
} finally {
signal?.removeEventListener('abort', abortFn);
Expand Down
29 changes: 18 additions & 11 deletions yarn-project/prover-node/src/job/epoch-proving-job.ts
Original file line number Diff line number Diff line change
Expand Up @@ -65,26 +65,32 @@ export class EpochProvingJob {
public async run() {
const epochNumber = Number(this.epochNumber);
const epochSize = this.blocks.length;
const firstBlockNumber = this.blocks[0].number;
this.log.info(`Starting epoch proving job`, { firstBlockNumber, epochSize, epochNumber, uuid: this.uuid });
const [fromBlock, toBlock] = [this.blocks[0].number, this.blocks.at(-1)!.number];
this.log.info(`Starting epoch ${epochNumber} proving job with blocks ${fromBlock} to ${toBlock}`, {
fromBlock,
toBlock,
epochSize,
epochNumber,
uuid: this.uuid,
});
this.state = 'processing';
const timer = new Timer();

const { promise, resolve } = promiseWithResolvers<void>();
this.runPromise = promise;

try {
this.prover.startNewEpoch(epochNumber, firstBlockNumber, epochSize);
this.prover.startNewEpoch(epochNumber, fromBlock, epochSize);

await asyncPool(this.config.parallelBlockLimit, this.blocks, async block => {
const globalVariables = block.header.globalVariables;
const txHashes = block.body.txEffects.map(tx => tx.txHash);
const txCount = block.body.numberOfTxsIncludingPadded;
const l1ToL2Messages = await this.getL1ToL2Messages(block);
const txs = await this.getTxs(txHashes);
const txs = await this.getTxs(txHashes, block.number);
const previousHeader = await this.getBlockHeader(block.number - 1);

this.log.verbose(`Starting block processing`, {
this.log.verbose(`Starting processing block ${block.number}`, {
number: block.number,
blockHash: block.hash().toString(),
lastArchive: block.header.lastArchive.root,
Expand All @@ -104,7 +110,7 @@ export class EpochProvingJob {
const publicProcessor = this.publicProcessorFactory.create(db, previousHeader, globalVariables);
await this.processTxs(publicProcessor, txs, txCount);
await db.close();
this.log.verbose(`Processed all txs for block`, {
this.log.verbose(`Processed all ${txs.length} txs for block ${block.number}`, {
blockNumber: block.number,
blockHash: block.hash().toString(),
uuid: this.uuid,
Expand All @@ -116,17 +122,16 @@ export class EpochProvingJob {

this.state = 'awaiting-prover';
const { publicInputs, proof } = await this.prover.finaliseEpoch();
this.log.info(`Finalised proof for epoch`, { epochNumber, uuid: this.uuid, duration: timer.ms() });
this.log.info(`Finalised proof for epoch ${epochNumber}`, { epochNumber, uuid: this.uuid, duration: timer.ms() });

this.state = 'publishing-proof';
const [fromBlock, toBlock] = [this.blocks[0].number, this.blocks.at(-1)!.number];
await this.publisher.submitEpochProof({ fromBlock, toBlock, epochNumber, publicInputs, proof });
this.log.info(`Submitted proof for epoch`, { epochNumber, uuid: this.uuid });

this.state = 'completed';
this.metrics.recordProvingJob(timer);
} catch (err) {
this.log.error(`Error running epoch prover job`, err, { uuid: this.uuid });
this.log.error(`Error running epoch ${epochNumber} prover job`, err, { uuid: this.uuid, epochNumber });
this.state = 'failed';
} finally {
await this.cleanUp(this);
Expand All @@ -149,13 +154,15 @@ export class EpochProvingJob {
return this.l2BlockSource.getBlockHeader(blockNumber);
}

private async getTxs(txHashes: TxHash[]): Promise<Tx[]> {
private async getTxs(txHashes: TxHash[], blockNumber: number): Promise<Tx[]> {
const txs = await Promise.all(
txHashes.map(txHash => this.coordination.getTxByHash(txHash).then(tx => [txHash, tx] as const)),
);
const notFound = txs.filter(([_, tx]) => !tx);
if (notFound.length) {
throw new Error(`Txs not found: ${notFound.map(([txHash]) => txHash.toString()).join(', ')}`);
throw new Error(
`Txs not found for block ${blockNumber}: ${notFound.map(([txHash]) => txHash.toString()).join(', ')}`,
);
}
return txs.map(([_, tx]) => tx!);
}
Expand Down

0 comments on commit cc6a72b

Please sign in to comment.