Another take on my Vimes project.
Vimes is a collection of virtual machines (VMs) and related resources for studying their performance, ease of implementation, and ease of use. This sandbox includes a variety of VMs with different architectures, dispatch techniques and implementations as well as benchmarks and utilities to help measure and compare their characteristics.
Warning: this is experimental / pre-alpha code.
Development:
-
VM instruction sets are easy to implement, as each instruction is relatively simple.
-
The biggest investments (so far) when creating a new VM are:
- Transforming assembly into machine code.
- Loading machine code into the VM.
- Creating a buffered reader and an int parser for stdin.
- Creating CLI.
- Creating tests and benchmarks.
-
The ability to define aliases for values (like
n=1
) and mnemonics (likepeek=lpa
) is a huge step forward in assembly UX with very little changes in the assembler (1 token is still 1 token). -
The ability to define multi-token aliases facilitates keeping the instruction set orthogonal (ie you can quickly add stacks with
peek
,poke
,inc
,dec
and they can either grow up or down). -
Having all 3 conditional jumps related to comparison (<0, >0, ==0) makes the code much easier to write and read.
-
Having array like access (ptr+offset) vs just pointers is a huge step forward in the assembly UX.
Performance:
-
π₯ benchmarking results π₯
-
Wirth's machine is 2-3 times slower than register machines without stack frames.
-
C enables more performant dispatch techniques, such as indirect and direct threading.
-
Indirect and direct threading are twice as fast as switch-based dispatch.
-
Indirect threading seems to be the best approach when writing a VM in C (10% slower but no code remapping).
-
The performance of C programs compiled with Zig is similar to that of programs compiled with gcc (Β±20%).
-
Nim's {.computedGoto.} pragma resulted in 10% slower code.
-
Translation of machine code into C results in extremely fast execution but the return from the procedure call is a bit tricky.
-
Zig acting as a C compiler outshines gcc in optimization of the translated machine code.
-
mk1 - machine from Wirth's 1976 book Algorithms + Data Structures = Programs
- mk2 -
mk1
with variable number of arguments, swapped a and l-
mk3 - internal bytecode version of
mk2
- abandoned as bytecode requires more work than fixed-width words - instructions variants and assembler changes (π)
-
mk4 - switch call threading version of
mk2
-
mk5 - indirect call threading version of
mk2
-
- mk2 -
-
mk6 - register based vm inspired by smol
- mk11 - similar to
mk6
but conditional jumps are based onacc
register- mk13 - similar to
mk11
but closer to the smol language
- mk13 - similar to
- mk12 - one operand version ok
mk6
- mkXX-
mk6
with stack frame similar tomk1
(π±)
- mk11 - similar to
-
mk7 - register based vm inspired by Human Resource Machine
- mk7c -
mk7
implemented in C - mk7ci -
mk7
implementation in C, indirect threading - mk7cd -
mk7
implementation in C, direct threading - mk7cc -
mk7
asm compiled to C code (π§) - mk8 -
mk7
extended with pointer operations, subroutine call/return and one more conditional jump (π)- mk8c -
mk8
implemented in C - mk8ci -
mk8
implemented in C, indirect threading - mk8cd -
mk8
implemented in C, indirect threading - mk8cc -
mk8
asm compiled to C code (π§)
- mk8c -
- mk9 - two operands version of
mk7
- mk10 -
mk9
extended with pointer operations and subroutine call/return- mkXX -
mk10
with better UX (π±)
- mkXX -
- mk10 -
- mkXX -
mk7
extended with cooperative multitasking instructions (π±)
- mk7c -
These machines use the same variable names as Wirth's example p-code machine from 1976 (available here).
-
p - program register
-
b - base register
-
t - topstack register
-
s - stack
-
i - instruction register
-
a - argument register
-
l - level register
-
code - program memory
- pc - program counter
- sp - stack pointer
- acc - accumulator / main register of the VM
- code - program storage memory
- mem - main memory
- stack - return stack
instructions
- LIT a ; load constant (a)
- INT a ; increment t-register by (a)
- LOD a b ; load variable (a) from level (b)
- STO a b ; store variable (a) at level (b)
- CAL a b ; call procedute (a) at level (b)
- JMP a ; jump to (a)
- JPC a ; jump conditional to (a)
- OPR a ; execute operation (a) ie OPR ADD
- EX1 a ; execute operation (a) from VM extension 1
- EX2 a ; execute operation (a) from VM extension 2
- EX3 a ; execute operation (a) from VM extension 3
...
- HLT ; halt the program
operations
- ADD ; (ab--c) c = a + b
- SUB ; (ab--c) c = a - b
- MUL ; (ab--c) c = a * b
- DIV ; (ab--c) c =
- RET ; (ab--c) c =
- NEG ; (a--b) b = -a
- ODD ; (a--b) b = a % 2
- MOD ; (ab--c) c = a % b
- EQ ; (ab--c) c = 1 if a==b else 0
- NE ; (ab--c) c = 1 if a!=b else 0
- LT ; (ab--c) c = 1 if a<b else 0
- LE ; (ab--c) c = 1 if a<=b else 0
- GT ; (ab--c) c = 1 if a>b else 0
- GE ; (ab--c) c = 1 if a>=b else 0
The notation (ab--c)
describes the stack effect of an operation. It indicates that the operation expects two items to be on the stack before execution, referred to as a
and b
, and after the operation is executed, these items are replaced by a single item c
on the stack. The items before --
are consumed (popped) from the stack, and the items after --
are produced (pushed) onto the stack.
extensions
Extension 1 - stdtio:
- PUTI ; (a--)
- GETI ; (--a)
- EOF ; (--a)
- PUTC ; (a--)
- GETC ; (--a)
Extension 2 - ALU - bit ops
- AND ; (ab--c)
- OR ; (ab--c)
- XOR ; (ab--c)
- NOT ; (a--b)
- SHL ; (ab--c)
- SHR ; (ab--c)
- SAR ; (ab--c)
Extension 3 - ALU - common ops
- INC ; (a--)
- DEC ; (a--)
- EQZ ; (ab--c)
- NEZ ; (ab--c)
- LTZ ; (ab--c)
- LEZ ; (ab--c)
- GTZ ; (ab--c)
- GEZ ; (ab--c)
- LIT a b ; set memory location (a) to literal value (b)
- MOV a b ; set memory location (a) to value from memory location (b)
- PEEK a b ; set memory location (a) with value from memory location indicated by (b)
- POKE a b ; set memory location indicated by (a) to value from memory location (b)
- JMP a 0 ; jump to program location (a)
- JZ a b ; if memory location (a) is zero then jump to program location (b)
- JNZ a b ; if memory location (a) is not zero then jump to program location (b)
- CAL a 0 ; call subroutine at program location (a)
- RET 0 0 ; return from subroutine call
- HLT 0 0 ; halt the program
- ADD a b ; mem[a] = mem[a] + mem[b]
- SUB a b ; mem[a] = mem[a] - mem[b]
- MUL a b ; mem[a] = mem[a] * mem[b]
- DIV a b ; mem[a] = mem[a] / mem[b]
- MOD a b ; mem[a] = mem[a] % mem[b]
- NEG a 0 ; mem[a] = -mem[a]
- EQ a b ; mem[a] = 1 if mem[a] == mem[b] else 0
- NE a b ; mem[a] = 1 if mem[a] != mem[b] else 0
- LT a b ; mem[a] = 1 if mem[a] < mem[b] else 0
- LE a b ; mem[a] = 1 if mem[a] <= mem[b] else 0
- GT a b ; mem[a] = 1 if mem[a] > mem[b] else 0
- GE a b ; mem[a] = 1 if mem[a] >= mem[b] else 0
- PUTC a 0 ; write the value from memory location (a) to stdout (as character)
- PUTI a 0 ; write the value from memory location (a) to stdout (as integer)
- GETC a 0 ; read a character from stdin and store it in memory location (a)
- GETI a 0 ; read an integer from stdin and store it in memory location (a), skip initial whitespaces
- EOF a 0 ; set memory location (a) to 1 if stdin indicates end-of-file or to 0 otherwise
- IN 0 ; read input to ACC
- OUT 0 ; write ACC to output
- LDA a ; load memory location (a) to ACC
- STA a ; store ACC in memory location (a)
- ADD a ; add memory location (a) to ACC
- SUB a ; subtract memory location (a) from ACC
- INC a ; increase memory location (a) by 1
- DEC a ; decrease memory location (a) by 1
- JMP a ; jump to address (a)
- JZ a ; jump to address (a) if ACC is zero
- JN a ; jump to address (a) if ACC is negative
- LIT a ; load (a) to ACC
- HLT 0 ; halt the program
mk7 instructions extended with:
- CAL a ; call procedure at address (a)
- RET 0 ; return from procedure
- LPA a ; load memory location pointed by (a) to ACC
- SPA a ; store ACC in memory location pointed by (a)
misc instructions:
- ASR a ; arithmetic shift right ACC by (a)
- NOP a ; do nothing, (a) can be used to mark labels
- IN a 0 ; read input to mem[a]
- OUT a 0 ; write mem[a] to output
- LIT a b ; mem[a] = b
- MOV a b ; mem[a] = mem[b]
- ADD a b ; mem[a] = mem[a] + mem[b]
- SUB a b ; mem[a] = mem[a] - mem[b]
- JMP a 0 ; jump to address (a)
- JZ a b ; jump to address (a) if mem[b] is zero
- JN a b ; jump to address (a) if mem[b] is negative
- HLT 0 0 ; halt the program
mk9 instructions extended with
- CAL a 0 ; call procedure at address (a)
- RET 0 0 ; return from procedure
- PTM a b ; transfer from pointer (b) to memory location (a)
- MTP a b ; transfer from memory location (b) to pointer (a)
- ASR a b ; arithmetic shift right mem[a] by (b)
- NOP a b ; do nothing, (a) can be used to mark labels
- JP a b ; jump to address (a) if mem[b] is positve (>0)
- LIT a b ; mem[a] = b
- MOV a b ; mem[a] = mem[b]
- LDA a 0 ; acc = mem[a] + b
- LDAP a b ; acc = mem[mem[a]+mem[b]]
- STA a 0 ; mem[a] = acc + b
- STAP a b ; mem[mem[a]+mem[b]] = acc
- JMP a 0 ; jump to program location (a)
- CAL a 0 ; call subroutine at (a)
- RET 0 0 ; return from subroutine call
- HLT 0 0 ; halt the program
- ADD a b ; mem[a] = mem[a] + mem[b]
- SUB a b ; mem[a] = mem[a] - mem[b]
- MUL a b ; mem[a] = mem[a] * mem[b]
- DIV a b ; mem[a] = mem[a] / mem[b]
- MOD a b ; mem[a] = mem[a] % mem[b]
- NEG a 0 ; mem[a] = -mem[a]
- CMP a b ; acc = mem[a] - mem[b] (compare)
- JEQ a 0 ; jump to (a) if acc == 0
- JLT a 0 ; jump to (a) if acc < 0
- JGT a 0 ; jump to (a) if acc > 0
- JNE a 0 ; jump to (a) if acc != 0
- JLE a 0 ; jump to (a) if acc <= 0
- JGE a 0 ; jump to (a) if acc >= 0
- GET a 0 ; mem[a] = read integer from stdin (skip whitespace, block)
- PUT a 0 ; write mem[a] as integer to stdout
- EOF a 0 ; mem[a] = 1 if stdid.eof else 0
- GETC a 0 ; mem[a] = read character from stdin (block)
- PUTC a 0 ; write mem[a] as character to stdout
- LIT a ; acc = a
- LDA a ; acc = mem[a]
- STA a ; mem[a] = acc
- PEEK a ; acc = mem[mem[a]]
- POKE a ; mem[mem[a]] = acc
- JMP a ; jump to program location (a)
- CAL a ; call subroutine at (a)
- RET 0 ; return from subroutine call
- HLT 0 ; halt the program
- EQ a ; acc = 1 if acc == mem[a] else 0
- NE a ; acc = 1 if acc != mem[a] else 0
- LT a ; acc = 1 if acc < mem[a] else 0
- LE a ; acc = 1 if acc <= mem[a] else 0
- GT a ; acc = 1 if acc > mem[a] else 0
- GE a ; acc = 1 if acc >= mem[a] else 0
- JZ a ; jump to (a) if acc == 0
- JNZ a ; jump to (a) if acc != 0
- ADD a ; acc += mem[a]
- SUB a ; acc -= mem[a]
- MUL a ; acc *= mem[a]
- DIV a ; acc /= mem[a]
- MOD a ; acc %= mem[a]
- NEG 0 ; acc = -acc
- INC a ; mem[a] += 1
- DEC a ; mem[b] -= 1
- PUT 0 ; write acc as integer to stdout
- GET 0 ; acc = read integer from stdin (skip whitespace, block)
- EOF 0 ; acc = 1 if stdid.eof else 0
- PUTC 0 ; write acc as character to stdout
- GETC 0 ; acc = read character from stdin (block)
- LIT a b ; mem[a] = b
- MOV a b ; mem[a] = mem[b]
- LDA a b ; acc = mem[a] + b
- LDAP a b ; acc = mem[mem[a]+mem[b]]
- STA a b ; mem[a] = acc + b
- STAP a b ; mem[mem[a]+mem[b]] = acc
- JMP a 0 ; jump to program location (a)
- CAL a 0 ; call subroutine at (a)
- RET 0 0 ; return from subroutine call
- HLT 0 0 ; halt the program
- ADD a b ; mem[a] = mem[a] + mem[b]
- SUB a b ; mem[a] = mem[a] - mem[b]
- MUL a b ; mem[a] = mem[a] * mem[b]
- DIV a b ; mem[a] = mem[a] / mem[b]
- MOD a b ; mem[a] = mem[a] % mem[b]
- NEG a 0 ; mem[a] = -mem[a]
- EQ a b ; acc = 1 if mem[a] == mem[b] else 0
- NE a b ; acc = 1 if mem[a] != mem[b] else 0
- LT a b ; acc = 1 if mem[a] < mem[b] else 0
- LE a b ; acc = 1 if mem[a] <= mem[b] else 0
- GT a b ; acc = 1 if mem[a] > mem[b] else 0
- GE a b ; acc = 1 if mem[a] >= mem[b] else 0
- JZ a 0 ; jump to (a) if acc == 0
- JNZ a 0 ; jump to (a) if acc != 0
- GET a 0 ; mem[a] = read integer from stdin (skip whitespace, block)
- PUT a 0 ; write mem[a] as integer to stdout
- EOF a 0 ; mem[a] = 1 if stdid.eof else 0
- GETC a 0 ; mem[a] = read character from stdin (block)
- PUTC a 0 ; write mem[a] as character to stdout
-
https://github.com/mobarski/vimes/blob/main/references.md
-
PL/0 and p-code VM:
-
Another World VM: