Optimize instruction dispatch #314

Robbepop · 2022-01-08T09:58:36Z

The majority of the overhead of interpreters and in particular wasmi interpreter is the overhead of the instruction dispatch.
Therefore there are 3 main ways to improve efficiency of efficient interpreters:

Improve the performance of the dispatch routines, i.e. reduce their overhead.
Reduce the amount of executed instructions, e.g. by combining instructions into super instructions.
Help the CPU branch predictor to correctly predict the next branch. This is due to the fact that instruction dispatch usually consists of at least one indirect branch. It is possible to help the CPU utilize better branch prediction by providing it with more information. For example having only a single branch when using a single match statement for the dispatch routine is less efficient than having a branch per instruction (match arm) since the branch predictor can include the position of the branch into account for its prediction. Some benchmark indicate 50%-100% performance gains.

Work Items

~~Fuse common instruction sequences into super instructions for wasmi bytecode during Wasm module compilation.~~
- Implement fused instructions #325
- We decided to not follow this route anymore. Instead we concentrate on getting the register machine approach working.
LLVM is able to optimize switch based dispatch into one where branch predictors will benefit more at the cost of increased binary size. LLVM usually opts out of this to our despair. It might be possible to find ways to make LLVM optimize into that form from within Rust.
- wasmi_v1: improve instruction scheduler #376 implements this.
~~LLVM already supports guaranteed tail calls. As soon as Rust provides them too we should definitely experiment with dispatch based on tail calls similar to the Wasm3 interpreter.~~
- As stated both Rust and WebAssembly currently do not have tail call support. We can reopen this issue or create a new issue once this has changed.

The text was updated successfully, but these errors were encountered:

Robbepop · 2022-02-12T10:48:01Z

This architecture could be used to speed up instruction dispatch in wasmi with safe Rust code:
https://github.com/Neopallium/s1vm

Robbepop · 2022-02-12T10:49:23Z

This article well describes different instruction dispatch techniques and their expected performance:
https://www.complang.tuwien.ac.at/forth/threaded-code.html

Robbepop · 2022-02-17T16:48:09Z

Research into different instruction dispatch techniques implementable in Rust:
https://github.com/Robbepop/interpreter-dispatch-research

Robbepop · 2022-07-13T16:28:58Z

PR merged to refactor the instruction dispatch for great wins: #376

Robbepop · 2022-08-22T18:42:58Z

Closed since all TODO items have been answered or resolved.

Robbepop added wasmi-v1 enhancement New feature or request labels Jan 8, 2022

Robbepop self-assigned this Jan 8, 2022

Robbepop closed this as completed Aug 22, 2022

Lohann mentioned this issue Apr 10, 2023

contracts: Support RISC-V bytecode paritytech/polkadot-sdk#115

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize instruction dispatch #314

Optimize instruction dispatch #314

Robbepop commented Jan 8, 2022 •

edited

Loading

Robbepop commented Feb 12, 2022

Robbepop commented Feb 12, 2022

Robbepop commented Feb 17, 2022 •

edited

Loading

Robbepop commented Jul 13, 2022

Robbepop commented Aug 22, 2022

Optimize instruction dispatch #314

Optimize instruction dispatch #314

Comments

Robbepop commented Jan 8, 2022 • edited Loading

Work Items

Robbepop commented Feb 12, 2022

Robbepop commented Feb 12, 2022

Robbepop commented Feb 17, 2022 • edited Loading

Robbepop commented Jul 13, 2022

Robbepop commented Aug 22, 2022

Robbepop commented Jan 8, 2022 •

edited

Loading

Robbepop commented Feb 17, 2022 •

edited

Loading