-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quorum Store] improvements to prevent some batches not getting quorum #11629
Conversation
⏱️ 18h 39m total CI duration on this PR
🚨 2 jobs on the last run were significantly faster/slower than expected
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #11629 +/- ##
==========================================
- Coverage 69.8% 69.7% -0.2%
==========================================
Files 2200 2111 -89
Lines 417626 401250 -16376
==========================================
- Hits 291876 279734 -12142
+ Misses 125750 121516 -4234 ☔ View full report in Codecov by Sentry. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
4e407e1
to
a703ed6
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
a703ed6
to
bc1baa9
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
for batch in batches.clone().into_iter() { | ||
persist_requests.push(batch.into()); | ||
} | ||
self.batch_writer.persist(persist_requests); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you observe any bottlenecks here? This might need to be parallelized or sent to another task, or this can block the loop for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running max_load_large I didn't see any issues
max_batch_expiry_gap_usecs: u64, | ||
validator: &ValidatorVerifier, | ||
) -> anyhow::Result<()> { | ||
if sender != self.signer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use ensure is simpler
if !state.completed { | ||
counters::TIMEOUT_BATCHES_COUNT.inc(); | ||
} | ||
Self::update_counters(&state); | ||
if !state.completed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a duplicate if condition as above?
Self::update_counters(&state); | ||
if !state.completed { | ||
info!( | ||
LogSchema::new(LogEvent::ProofOfStoreInit), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the event is ProofOfStoreInit?
existing_proof.remove(); | ||
let incremental_proof = existing_proof.get(); | ||
if !incremental_proof.completed { | ||
warn!("QS: received commit notification for batch that did not complete: {}, self_voted: {}", digest, incremental_proof.self_voted); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm how can this happen?
ba18903
to
0be0dc7
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
Description
Occasionally, the metrics show batches not getting quorum, in forge and testnet.
In addition the counters for counting proof votes are cleaned up, so the votes/stake are counted on expire time (as originally intended).
Test Plan
Run
realistic_env_max_load_large
and observe that no more proofs fail to get quorum, and metrics look better