Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MASP storage optimization #2343

Closed
grarco opened this issue Dec 28, 2023 · 5 comments
Closed

MASP storage optimization #2343

grarco opened this issue Dec 28, 2023 · 5 comments

Comments

@grarco
Copy link
Collaborator

grarco commented Dec 28, 2023

Currently the MASP storage is managed in a suboptimal way. More specifically, the following can be observed:

  1. All the keys are written under the subspace, meaning they are subject to diffs and merklization which is unneeded because we only need the most updated value of the masp keys and we already manage the masp merkle trees in a separate way
  2. We are currently writing the Transfer and Transaction objects (plus some other metadata) into storage to allow clients to scan the history of masp transactions and construct their internal states to correctly produce future transactions
  3. The conversion_state
    pub struct ConversionState {
    /// The last amount of the native token distributed
    pub normed_inflation: Option<u128>,
    /// The tree currently containing all the conversions
    pub tree: FrozenCommitmentTree<Node>,
    /// A map from token alias to actual address.
    pub tokens: BTreeMap<String, Address>,
    /// Map assets to their latest conversion and position in Merkle tree
    #[allow(clippy::type_complexity)]
    pub assets: BTreeMap<
    AssetType,
    ((Address, MaspDenom), Epoch, AllowedConversion, usize),
    >,
    }
    is kept in memory under the Storage object

Issue 1 should be solved by #2265.

Issue 2 could be solved by not writing this data to storage since we already have it in the block storage. Clients could query the blocks instead (possibly with the help of an indexer), to get the same results. We should also make sure to keep the blocks around in case of a chain upgrade (see discussion at #2248 (comment)).

Issue 3 is probably not as major as the previous two. The ConversionState is effectively only queried by clients and updated in protocol once per epoch, so we could probably move it to storage to reduce the memory consumption (hopefully without increasing the query time too much because of the need to retrieve data from storage first)

@grarco
Copy link
Collaborator Author

grarco commented Dec 28, 2023

@cwgoes for reference

@cwgoes
Copy link
Collaborator

cwgoes commented Jan 2, 2024

Thanks for the writeup. I think we should definitely fix 1 & 2. Regarding 3, I'm not too worried about memory consumption, as long as we have a reasonable understanding of the bounds and total Namada memory usage does not exceed single-digit GBs. Do we have a sense of how large (concretely) this struct would be?

@grarco
Copy link
Collaborator Author

grarco commented Jan 2, 2024

I've tried to run some approximate calculations. So the ConversionState struct is composed of four fields:

  • normed_inflation, which is an Option<u128>, is constant at ~ 17 bytes
  • tree, which is a FrozenCommtimentTree that we update once per epoch
  • tokens, which is a map from a token alias to its address should remain low in size (BTreeMap<String, Address>)
  • assets, which is a map between AssetType and some data about it, will actually grow once per epoch, more specifically we'll need two inserts per epoch per every unique (token, denom) couple, one for the old asset and one for the new asset, but the one for the old asset is actually an overwrite of the previous value, so we can count just one insert

Here are some estimations on the size of the types involved (these are based only on the types, I'm not taking into account optimizations/paddings that the compiler might perform, e.g. to align types in memory):

Alias ~ 3 UTF8 chars, (suppose ASCII chars, so one byte per char), 3 bytes, ignore the size of the String smart pointer
AssetType ~ 33 bytes
Address ~ 20 bytes
MaspDenom = 1 byte (we have 4 denominations)
Epoch = 8 bytes
AllowedConversions = I28Sum + jujub::ExtendedPoint
	I128Sum = BTreeMap<AssetType, u128> => but actually this map is always empty for us
	ExtendPoint = 5 * 4 * 8 bytes = 160 bytes
usize = 8 bytes on 64 bits machines

Node = 32 bytes
FrozenCommitmentTree = (Vec<Node>, usize)
conv_notes = number of conversion notes, one for every value (so asset type) in the Map of assets in ConversionState
Capacity of Vec<Node> = 31 + 2 * conv_notes

So I believe the cost in memory of the conversion state could be divided into two parts: a constant (or semi-constant) part and a variable part that gets updated with every epoch change.

The constant part would be the cost of the normed_inflation plus the cost of the tokens map, as a function of the number of token addresses $t$:

$M_c(t) = 17 + t * (3 + 20) = 37 + 3t = C + 3t$

The epoch-variable part (at a given epoch $e$) is composed of the merkle tree and the assets fields. Assuming no new tokens are added in the current epoch, the number of new asset types (where every AssetType is the concatenation of a token address, a masp denomination and the epoch) that will be inserted, based on the number of token addresses, is:

$n_a(t) = (t * 4) * (33 + 20 + 1 + 8 + 160 + 8)$

And the delta increase of the frozen tree compared to its value at the previous epoch will be:

$d_{tree}(n_a) = 32 * (31 + 2 * n_a)$

The variable part can then be expressed as:

$M_{v, e}(t)= d_{tree} + n_a = 32 * (31 + 2 * n_a) + n_a = 992 + 64n_a + n_a = 992 + 65n_a = 992 + 65 * (4t * 230) = 992 + 59800t$

Ignoring the constant parts, the increase in memory consumption given by the variable part would be approximately 59800 bytes per token address per epoch, so ~ 58kB

@cwgoes
Copy link
Collaborator

cwgoes commented Jan 2, 2024

Thanks! Excellent write-up. I think that those memory costs are fine for the foreseeable future.

@grarco
Copy link
Collaborator Author

grarco commented Apr 11, 2024

Closing as completed

@grarco grarco closed this as completed Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants