ADR-040 performance issue leads #21

i-norden · 2022-03-02T15:29:55Z

IAVL vs SMT

Optimizations relative to IAVL:

Hashed keys => shorter keys => database’s underlying LSM is not as deep, doesn’t take as long to traverse
Storage bucket mapping key directly to value (no need to traverse the commitment object to read values)
Leveraging versioned database- everything is indexed/denormalized by block number, making historical access (read/write) much faster. Also means the database backend can handle pruning old state itself.

The other optimization work is to optimize the SMT implementation, brining it up to speed with IAVL:

Instead of it flushing updated nodes to disc after every update, it should wait until the end of a block cycle (commit) and flush the final state only.
Instead of materializing the intermediate nodes after every update, it should wait until the end of a block cycle (commit) and materialize the intermediate nodes only for the final state that is flushed to disc.
Remove unused internal hashing of keys

Concrete tasks:

Remove redundant hash(key) => value mapping in SMT (redundant to B2 bucket at SDK state storage level)
Cache/Commit cycles at SMT layer (look into how repeat ops on same key are resolved by badgerDB and rocksDB txs)
Calculate root and parents at end of commit cycle not after every insert
Investigate concept of extension nodes in SMT
Investigate more performant hashing function
Investigate using a non-binary version of SMT
Investigate prefix optimization- map long prefixes to short byte identifiers

At the implementation level, look into:

Update SMT implementation to make key hashing optional (and handle it at SDK layer)
Update SMT implementation to remove its internal hash(key) => value mapping and rely on B1 bucket at SDK layer

Related hackmd: https://hackmd.io/pESkHH3aQhugMLpGH2pBzw

The text was updated successfully, but these errors were encountered:

i-norden · 2022-04-19T14:23:04Z

Roy has made these changes and more here: vulcanize/smt#5

As seen the new flame charts he has produced, most the time is now being spent hashing so a new line of consideration should be to investigate using faster hash functions.

i-norden · 2022-04-20T13:46:15Z

Another thing to consider is to introduce the concept of an intermediate node to the SMT, analogous to how the MMPT modified the normal patricia trie.

i-norden · 2023-02-01T15:14:39Z

Upstreamed SMT updates: celestiaorg/smt#73

i-norden · 2023-02-01T15:16:57Z

Benchmarks: cosmos#11444 (comment)

i-norden · 2023-02-01T15:17:29Z

The main task that was remaining was to test in a meaningful environment, e.g. #26

i-norden closed this as completed Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR-040 performance issue leads #21

ADR-040 performance issue leads #21

i-norden commented Mar 2, 2022 •

edited by roysc

Loading

i-norden commented Apr 19, 2022

i-norden commented Apr 20, 2022

i-norden commented Feb 1, 2023

i-norden commented Feb 1, 2023

i-norden commented Feb 1, 2023

ADR-040 performance issue leads #21

ADR-040 performance issue leads #21

Comments

i-norden commented Mar 2, 2022 • edited by roysc Loading

i-norden commented Apr 19, 2022

i-norden commented Apr 20, 2022

i-norden commented Feb 1, 2023

i-norden commented Feb 1, 2023

i-norden commented Feb 1, 2023

i-norden commented Mar 2, 2022 •

edited by roysc

Loading