configurable chain and state data retention policies + expunge process #5056

raulk · 2020-11-30T10:22:47Z

Goal

We proposed a solution to avoid the frequent "out of disk space" errors in #4753. This issue specifies it further.

The goal is to enable users to specify chain and state retention policies in their $LOTUS_HOME/config.toml file.

It could look like this:

[Blockstore]
RetainChain = 10000 # prune chain objects that went off scope 10000 tipsets ago
RetainState = 100   # prune state objects that went off scope 10000 tipsets ago

Design

Expunge process and QoS

We introduce an async, background expunge process that operates entirely on the chain and state cold stores.
It takes care of applying these policies in a best-effort fashion.
It's best effort because it needs to be throttled, as it's generally not considered critical and thus should not steal CPU nor IO bandwidth from critical processes.
The throttling rate could vary depending on disk usage (the less disk space we have available, the more compute / IO we allocate to this process, to avoid collapse).

Requirements

To make this process efficient, we need to record metadata in the cold store, alongside the block. This can be done by native means (e.g. Badger has a Metadata field which is stored next to the key in the indices), or by wrapping the block in a metadata container.
The archival process introduced in hot/cold blockstore segregation (aka. splitstore) #4992 would need to record the epoch at which the object went off scope in the metadata.
The expunge process would then iterate over the cold store and apply the retention policy by deleting objects that exceed the threshold.
The first run would be a special one.
- If this is released alongside the splitstore, none of the objects would carry metadata (since the splitstore turns the current store into the cold store).
- But as objects archived by the splitstore start making it into the cold store, we would start populating that metadata.
- If the split store "pulls" objects onto the hot store, we can assume that unmarked objects went out of scope before the lowest epoch written in metadata. There is no risk of unmarked active objects.
- We can achieve the above by making the splitstore stop the world on first initialisation and run a single state tree walk to copy actual hot objects into the hot store.

The text was updated successfully, but these errors were encountered:

shawnp0wers · 2020-12-13T14:21:53Z

If this process is automatic, it would be incredible not just for space considerations, but also memory usage. I do fear the additional metadata storage might create more I/O and/or memory usage, which would make the memory issue worse.

I also wonder about the QoS notion -- I could see a situation where the daemon is overwhelmed due to large blockchain, and because it's trying to handle the stress, it can't run the pruning process which would alleviate the stress.

raulk · 2021-01-27T14:16:44Z

For expunging the coldstore, I’m planning to use index metadata to record the “last reachable” epoch for each block in the state tree.

This is available on badger (WithMeta()) and I think we can make this feature available on gonudb (cc @iand).

This would allow the expunge process to iterate over all keys in the cold store (rate limited, so as not to affect performance) and delete the keys that have fallen outside the retention policy.

Of course, a challenge is performing the actual physical deletion:

in badger we need to call Flatten(), so that tombstoned keys are deleted.
nudb doesn’t support deletions AFAIK, not sure how we could implement them in gonudb. Alternatively we could rotate the coldstore every time, but that would increase the disk space requirements.

raulk mentioned this issue Nov 30, 2020

chain/state store improvements: REDESIGN (segregation, two-tier stores, archival, etc) #4753

Closed

dineshshenoy added this to the █Blockstore Improvements milestone Nov 30, 2020

jennijuju added area/chain Area: Chain P2 P2: Should be resolved labels Dec 1, 2020

raulk mentioned this issue Jan 12, 2021

lmdb: automatic adjustment of mmapped size #5334

Closed

raulk self-assigned this Jan 27, 2021

raulk mentioned this issue Jan 27, 2021

Add prune command to lotus-shed (maybe lotus chain?) #2414

Closed

arajasek unassigned raulk May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configurable chain and state data retention policies + expunge process #5056

configurable chain and state data retention policies + expunge process #5056

raulk commented Nov 30, 2020

shawnp0wers commented Dec 13, 2020

raulk commented Jan 27, 2021

configurable chain and state data retention policies + expunge process #5056

configurable chain and state data retention policies + expunge process #5056

Comments

raulk commented Nov 30, 2020

Goal

Design

Expunge process and QoS

Requirements

shawnp0wers commented Dec 13, 2020

raulk commented Jan 27, 2021