fix(state_sync_massive): Epoch length was too short #3208

mfornet · 2020-08-18T22:53:48Z

Epoch length was too short (20) and observer when tried
to sync, used old epoch, but right now, trying to sync from
any epoch other than current epoch or previous epoch is considered
malicious behavior.

Fixes #3130

Test plan

state_sync_massive.py passes in nightly

gitpod-io · 2020-08-18T22:53:52Z

mfornet · 2020-08-18T22:59:11Z

Waiting for nightly test to pass:
http://nayduck.eastus.cloudapp.azure.com:3000/#/test/33015

bowenwang1996 · 2020-08-19T03:26:15Z

@mfornet looks like it failed

SkidanovAlex · 2020-08-19T05:32:39Z

@mfornet you also do not trigger state sync right now I think, I believe it needs to be at least two epochs in today for the observer to actually trigger the state sync (so you probably need to change 101 and 201 to 201 and 301 correspondingly).
I would assert using the log tracker that the state sync was indeed triggered, the same way that we do in state_sync.py

codecov · 2020-08-19T22:50:42Z

Codecov Report

❗ No coverage uploaded for pull request base (master@f85e142). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master    #3208   +/-   ##
=========================================
  Coverage          ?   87.71%           
=========================================
  Files             ?      217           
  Lines             ?    44113           
  Branches          ?        0           
=========================================
  Hits              ?    38695           
  Misses            ?     5418           
  Partials          ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f85e142...367d38e. Read the comment docs.

SkidanovAlex

Given it's all green to be pushed, requesting changes until it actually passes in nayduck to prevent accidental merge-in.

Epoch length was too short (20) and observer when tried to sync, used old epoch, but right now, trying to sync from any epoch other than current epoch or previous epoch is considered malicious behavior.

mfornet · 2020-09-22T08:28:59Z

This test is performing really poorly on nayduck while passing locally. I've need to adjust few times the size of the epoch, since the protocol relies in the assumption that the node will be able to sync in less than one epoch.

node0 logs
node1 logs

Most notably on nayduck, nodes produced 115 blocks in 20 minutes which is a sign that something is going really wrong. I see in logs from node0 that there are a lot of messages being dropped (which is very weird in this setup with only 2 nodes). However this should not be an issue anymore after #3088 lands.

What puzzles me is that it works every single time locally while failing on nayduck. Probably the reason is a combination of (low performant machine + small epoch length + large state). Notice that in the run above, the node2 didn't even started, so the issue is not with the new node syncing.

@SkidanovAlex @bowenwang1996 are you aware of any protocol mechanism that heavily depends on the epoch length being large enough, otherwise defaulting to significant slowdown?

bowenwang1996 · 2020-09-22T14:41:40Z

@mfornet can you disable storage check in the test? Looks like it is causing quite a bit of slowdown while the node is running

mfornet · 2020-09-22T15:41:41Z

@mfornet can you disable storage check in the test? Looks like it is causing quite a bit of slowdown while the node is running

It passed nayduck after removing storage_check 💯
http://nayduck.eastus.cloudapp.azure.com:3000/#/test/33017

@SkidanovAlex we can unblock and merge this PR already

bowenwang1996 · 2020-09-23T03:37:21Z

It passed nayduck after removing storage_check 💯

Epic 🚀

SkidanovAlex · 2020-09-23T14:03:19Z

Can we add the assert that the node actually state_synced (not block synced)? Like at the end of state_sync.py:

assert tracker.check("transition to State Sync")

mfornet · 2020-09-25T20:15:11Z

Can we add the assert that the node actually state_synced (not block synced)? Like at the end of state_sync.py:
assert tracker.check("transition to State Sync")

http://nayduck.eastus.cloudapp.azure.com:3000/#/run/401

mfornet requested review from SkidanovAlex and bowenwang1996 August 18, 2020 22:53

bowenwang1996 approved these changes Aug 18, 2020

View reviewed changes

SkidanovAlex requested changes Aug 20, 2020

View reviewed changes

mfornet added 4 commits September 2, 2020 13:14

fix(state_sync_massive): Epoch length was too short

6bffe9f

Epoch length was too short (20) and observer when tried to sync, used old epoch, but right now, trying to sync from any epoch other than current epoch or previous epoch is considered malicious behavior.

Increase height to cover epochs

d647305

Increase timeout

470a160

Use logging, show dir, and increase timeout

1646112

mfornet force-pushed the massive_state_sync_increase_epoch branch from d295c80 to 1646112 Compare September 2, 2020 17:14

Merge branch 'master' into massive_state_sync_increase_epoch

19b5a79

mfornet force-pushed the massive_state_sync_increase_epoch branch from 41f31ca to 19b5a79 Compare September 22, 2020 06:31

Merge branch 'master' into massive_state_sync_increase_epoch

fc37232

Disable state storage checkin

24a325c

mfornet requested a review from SkidanovAlex September 22, 2020 15:41

mfornet and others added 2 commits September 22, 2020 11:41

Merge branch 'master' into massive_state_sync_increase_epoch

e747e54

Merge branch 'master' into massive_state_sync_increase_epoch

a5c6ab2

Check that node is doing state sync

e2c25c2

SkidanovAlex approved these changes Sep 25, 2020

View reviewed changes

Merge branch 'master' into massive_state_sync_increase_epoch

367d38e

mfornet added the automerge label Sep 25, 2020

nearprotocol-bulldozer bot merged commit 48cc4f6 into master Sep 25, 2020

nearprotocol-bulldozer bot deleted the massive_state_sync_increase_epoch branch September 25, 2020 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(state_sync_massive): Epoch length was too short #3208

fix(state_sync_massive): Epoch length was too short #3208

mfornet commented Aug 18, 2020 •

edited

Loading

gitpod-io bot commented Aug 18, 2020 •

edited

Loading

mfornet commented Aug 18, 2020 •

edited

Loading

bowenwang1996 commented Aug 19, 2020

SkidanovAlex commented Aug 19, 2020

codecov bot commented Aug 19, 2020 •

edited

Loading

SkidanovAlex left a comment

mfornet commented Sep 22, 2020 •

edited

Loading

bowenwang1996 commented Sep 22, 2020

mfornet commented Sep 22, 2020 •

edited

Loading

bowenwang1996 commented Sep 23, 2020

SkidanovAlex commented Sep 23, 2020

mfornet commented Sep 25, 2020

fix(state_sync_massive): Epoch length was too short #3208

fix(state_sync_massive): Epoch length was too short #3208

Conversation

mfornet commented Aug 18, 2020 • edited Loading

Test plan

gitpod-io bot commented Aug 18, 2020 • edited Loading

mfornet commented Aug 18, 2020 • edited Loading

bowenwang1996 commented Aug 19, 2020

SkidanovAlex commented Aug 19, 2020

codecov bot commented Aug 19, 2020 • edited Loading

Codecov Report

SkidanovAlex left a comment

Choose a reason for hiding this comment

mfornet commented Sep 22, 2020 • edited Loading

bowenwang1996 commented Sep 22, 2020

mfornet commented Sep 22, 2020 • edited Loading

bowenwang1996 commented Sep 23, 2020

SkidanovAlex commented Sep 23, 2020

mfornet commented Sep 25, 2020

mfornet commented Aug 18, 2020 •

edited

Loading

gitpod-io bot commented Aug 18, 2020 •

edited

Loading

mfornet commented Aug 18, 2020 •

edited

Loading

codecov bot commented Aug 19, 2020 •

edited

Loading

mfornet commented Sep 22, 2020 •

edited

Loading

mfornet commented Sep 22, 2020 •

edited

Loading