Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR-040 performance issue leads #21

Closed
7 of 9 tasks
i-norden opened this issue Mar 2, 2022 · 5 comments
Closed
7 of 9 tasks

ADR-040 performance issue leads #21

i-norden opened this issue Mar 2, 2022 · 5 comments

Comments

@i-norden
Copy link
Collaborator

i-norden commented Mar 2, 2022

IAVL vs SMT

Optimizations relative to IAVL:

  1. Hashed keys => shorter keys => database’s underlying LSM is not as deep, doesn’t take as long to traverse
  2. Storage bucket mapping key directly to value (no need to traverse the commitment object to read values)
  3. Leveraging versioned database- everything is indexed/denormalized by block number, making historical access (read/write) much faster. Also means the database backend can handle pruning old state itself.

The other optimization work is to optimize the SMT implementation, brining it up to speed with IAVL:

  1. Instead of it flushing updated nodes to disc after every update, it should wait until the end of a block cycle (commit) and flush the final state only.
  2. Instead of materializing the intermediate nodes after every update, it should wait until the end of a block cycle (commit) and materialize the intermediate nodes only for the final state that is flushed to disc.
  3. Remove unused internal hashing of keys

Concrete tasks:

  • Remove redundant hash(key) => value mapping in SMT (redundant to B2 bucket at SDK state storage level)
  • Cache/Commit cycles at SMT layer (look into how repeat ops on same key are resolved by badgerDB and rocksDB txs)
  • Calculate root and parents at end of commit cycle not after every insert
  • Investigate concept of extension nodes in SMT
  • Investigate more performant hashing function
  • Investigate using a non-binary version of SMT
  • Investigate prefix optimization- map long prefixes to short byte identifiers

At the implementation level, look into:

  • Update SMT implementation to make key hashing optional (and handle it at SDK layer)
  • Update SMT implementation to remove its internal hash(key) => value mapping and rely on B1 bucket at SDK layer

Related hackmd: https://hackmd.io/pESkHH3aQhugMLpGH2pBzw

@i-norden
Copy link
Collaborator Author

Roy has made these changes and more here: vulcanize/smt#5

As seen the new flame charts he has produced, most the time is now being spent hashing so a new line of consideration should be to investigate using faster hash functions.

@i-norden
Copy link
Collaborator Author

Another thing to consider is to introduce the concept of an intermediate node to the SMT, analogous to how the MMPT modified the normal patricia trie.

@i-norden
Copy link
Collaborator Author

i-norden commented Feb 1, 2023

Upstreamed SMT updates: celestiaorg/smt#73

@i-norden
Copy link
Collaborator Author

i-norden commented Feb 1, 2023

Benchmarks: cosmos#11444 (comment)

@i-norden
Copy link
Collaborator Author

i-norden commented Feb 1, 2023

The main task that was remaining was to test in a meaningful environment, e.g. #26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant