Properly handle garbage collection when we do state sync #3044

bowenwang1996 · 2020-07-26T21:40:28Z

Currently we use reset_data_pre_state_sync to delete data before state sync to prevent stale data from never being garbage collected. However, this is very aggressive and has caused issues if we need to access old data when we do state sync (#3042). It is desirable to remove such aggressive garbage collection to make code more maintainable. Therefore we should

change gc invariant to allow multiple nonoverlapping segment of the chain to coexist in storage
bonus: have a background thread to clean the old segment, if any exists.

The text was updated successfully, but these errors were encountered:

Kouprin · 2020-07-27T07:59:50Z

There are my thoughts:

Let's say, the node skipped 1 mln blocks, then turns on with no clearing old data by reset_data_pre_state_sync, download some blocks and turns off for skipping 1 mln blocks more, and then turns on again. In this case we will have [some data][1 mln skips][some data][1 mln skips][some data]. So, we should handle multiple non-overlapping segments, not only two.
Another problem is related how we handle skips. Now we clear two blocks if possible. In case of skips, there may be huge gap between blocks and GC would work for a long time until finds a block in a new segment. To avoid it, we can limit number of heights by constant. However, it means, GC will be paused from real deletion of data because of passing thru the skip. The best solution here is to pass skips immediately but in this case we need to store and process all skips. Handling skips seems to be hard to test and requires lots of coding.
We need to update GC invariant that is dangerous itself and makes our system more vulnerable. Now we don't allow skips completely and if skip appears it means something is broken. After the change, it may be hard to differentiate valid skips from invalid ones.
GC height will be calculated properly only after 5 epochs are downloaded. It means, after skip we will download 5 epochs before GC starts removing very old data. In this case we will have 5 old epochs + 5 new epochs of data in the storage at the same time, which doubles the data stored. If we continue turning off and on the node, the amount of stored data will be multiplied.

It seems that removing reset_data_pre_state_sync is far from easy. However, the whole idea of reset_data_pre_state_sync optimization based on two assumptions:

We can start the node with empty storage properly.
All data preceding sync_hash is useless.

I suggest to rely on assumption 2 and replace reset_data_pre_state_sync with reset_data_pre_sync to drop everything from database before whole syncing process to imitate assumption 1. The plan is:

Find the right place in the sync() method where we can drop the data safely and only once.
Find out the conditions when we can drop the data.
Implementation. Ideally, we should traverse through all Blocks stored and call GC for them. (Needs to make sure that GC for Blocks can be called in any order).
Update Tails properly. I guess we can simply reuse reset_data_post_state_sync.

@bowenwang1996 @SkidanovAlex WDYT?

bowenwang1996 · 2020-07-27T18:20:04Z

I suggest to rely on assumption 2 and replace reset_data_pre_state_sync with reset_data_pre_sync to drop everything from database before whole syncing process to imitate assumption 1. The plan is:

That unfortunately doesn't always work. State sync isn't always invoked. When the node is close to the head of the network it will do block sync only.

Kouprin · 2020-07-29T16:02:51Z

That unfortunately doesn't always work. State sync isn't always invoked. When the node is close to the head of the network it will do block sync only.

That's reasonable.

Discussed with @SkidanovAlex. After new Sync will be implemented, there may be an opportunity to refactor reset_data_pre_state_sync. Currently it's hard to do something useful and cheap here.

bowenwang1996 assigned Kouprin Jul 26, 2020

bowenwang1996 added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2020

Kouprin closed this as completed Jul 29, 2020

weekly-digest bot mentioned this issue Jul 31, 2020

Weekly Digest (24 July, 2020 - 31 July, 2020) #3062

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly handle garbage collection when we do state sync #3044

Properly handle garbage collection when we do state sync #3044

bowenwang1996 commented Jul 26, 2020 •

edited

Loading

Kouprin commented Jul 27, 2020 •

edited

Loading

bowenwang1996 commented Jul 27, 2020

Kouprin commented Jul 29, 2020

Properly handle garbage collection when we do state sync #3044

Properly handle garbage collection when we do state sync #3044

Comments

bowenwang1996 commented Jul 26, 2020 • edited Loading

Kouprin commented Jul 27, 2020 • edited Loading

bowenwang1996 commented Jul 27, 2020

Kouprin commented Jul 29, 2020

bowenwang1996 commented Jul 26, 2020 •

edited

Loading

Kouprin commented Jul 27, 2020 •

edited

Loading