Skip to content

Commit

Permalink
fix(tee): correct previous fix for race condition in batch locking (#…
Browse files Browse the repository at this point in the history
…3358)

## What ❔

Commit a7dc0ed (PR #3342) was supposed
to fix a race condition in batch locking by introducing SQL row-locking,
but it [didn't work][2] as expected.
![Screenshot From 2024-12-04
11-32-32](https://github.com/user-attachments/assets/959ffc3c-593f-409a-87ab-68ec197040a0)
Now we are switching back to coarser-grained table-level locking as
[originally suggested][1] by Harald. The original fix was hard to test
unless deployed to `stage` due to the undeterministic nature of the
problem, so we needed to merge it to the `main` branch to properly test
it.

[1]:
#3342 (comment)
[2]: https://grafana.matterlabs.dev/goto/AhEd5FVNg?orgId=1

## Why ❔

To fix the bug that only activates after running `zksync-tee-prover` on
multiple instances.

## Checklist

- [x] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [ ] Tests for the changes have been added / updated.
- [ ] Documentation comments have been added / updated.
- [x] Code has been formatted via `zkstack dev fmt` and `zkstack dev
lint`.
  • Loading branch information
pbeza authored Dec 4, 2024
1 parent dc2c476 commit b12da8d
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 8 deletions.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 10 additions & 6 deletions core/lib/dal/src/tee_proof_generation_dal.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,16 @@ impl TeeProofGenerationDal<'_, '_> {
let min_batch_number = i64::from(min_batch_number.0);
let mut transaction = self.storage.start_transaction().await?;

// Lock rows in the proof_generation_details table to prevent race conditions. The
// tee_proof_generation_details table does not have corresponding entries yet if this is the
// first time the query is invoked for a batch. Locking rows in proof_generation_details
// ensures that two different TEE prover instances will not try to prove the same batch.
// Lock the entire tee_proof_generation_details table in EXCLUSIVE mode to prevent race
// conditions. Locking the table ensures that two different TEE prover instances will not
// try to prove the same batch.
sqlx::query("LOCK TABLE tee_proof_generation_details IN EXCLUSIVE MODE")
.instrument("lock_batch_for_proving#lock_table")
.execute(&mut transaction)
.await?;

// The tee_proof_generation_details table does not have corresponding entries yet if this is
// the first time the query is invoked for a batch.
let batch_number = sqlx::query!(
r#"
SELECT
Expand All @@ -95,8 +101,6 @@ impl TeeProofGenerationDal<'_, '_> {
)
)
LIMIT 1
FOR UPDATE OF p
SKIP LOCKED
"#,
tee_type.to_string(),
TeeProofGenerationJobStatus::PickedByProver.to_string(),
Expand Down

0 comments on commit b12da8d

Please sign in to comment.