-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(state_sync_massive): Epoch length was too short #3208
fix(state_sync_massive): Epoch length was too short #3208
Conversation
Waiting for nightly test to pass: |
@mfornet looks like it failed |
@mfornet you also do not trigger state sync right now I think, I believe it needs to be at least two epochs in today for the observer to actually trigger the state sync (so you probably need to change 101 and 201 to 201 and 301 correspondingly). |
Codecov Report
@@ Coverage Diff @@
## master #3208 +/- ##
=========================================
Coverage ? 87.71%
=========================================
Files ? 217
Lines ? 44113
Branches ? 0
=========================================
Hits ? 38695
Misses ? 5418
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given it's all green to be pushed, requesting changes until it actually passes in nayduck to prevent accidental merge-in.
Epoch length was too short (20) and observer when tried to sync, used old epoch, but right now, trying to sync from any epoch other than current epoch or previous epoch is considered malicious behavior.
d295c80
to
1646112
Compare
41f31ca
to
19b5a79
Compare
This test is performing really poorly on nayduck while passing locally. I've need to adjust few times the size of the epoch, since the protocol relies in the assumption that the node will be able to sync in less than one epoch. Most notably on nayduck, nodes produced 115 blocks in 20 minutes which is a sign that something is going really wrong. I see in logs from node0 that there are a lot of messages being dropped (which is very weird in this setup with only 2 nodes). However this should not be an issue anymore after #3088 lands. What puzzles me is that it works every single time locally while failing on nayduck. Probably the reason is a combination of (low performant machine + small epoch length + large state). Notice that in the run above, the node2 didn't even started, so the issue is not with the new node syncing. @SkidanovAlex @bowenwang1996 are you aware of any protocol mechanism that heavily depends on the epoch length being large enough, otherwise defaulting to significant slowdown? |
@mfornet can you disable storage check in the test? Looks like it is causing quite a bit of slowdown while the node is running |
It passed nayduck after removing @SkidanovAlex we can unblock and merge this PR already |
Epic 🚀 |
Can we add the assert that the node actually state_synced (not block synced)? Like at the end of
|
|
Epoch length was too short (20) and observer when tried
to sync, used old epoch, but right now, trying to sync from
any epoch other than current epoch or previous epoch is considered
malicious behavior.
Fixes #3130
Test plan
state_sync_massive.py passes in nightly