Skip to content

brainfuck compiler that outputs an x86-64 elf executable for linux using no external dependencies

Notifications You must be signed in to change notification settings

vishen/go-brainfunk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Brainfunk

go-brainfunk is a brainfuck compiler that emits generated x64 elf for Linux. The x64 instructions are generated and encoded programatically and the raw bytes are used to output an elf executable. This is my first time generating x64 encodings and elf executables manually, so there is likely mistakes and better approaches.

The brainfuck compiler is currently unable to read from stdin, which is one of the expected commands, and it currently outputs to a buffer and then that buffer is written to stdout. Some brainfuck programs may not work at the moment, but I am looking at adding these features soon.

Installing

Install the usual Go way go get -u github.com/vishen/go-brainfunk

Running

# Usage
$ go-brainfunk
missing required flag -f <path/to/brainfuck program>
usage: go-brainfunk -f /path/to/brainfuck-file -o <output-binary>
$ cat ./examples/hello_world.bf
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
$ go-brainfunk -f ./examples/hello_world.bf
wrote executable to hello_world
s ls -hal ./hello_world
-rwxr-xr-x 1 vishen vishen 788 Jun 28 11:01 ./hello_world*
# Even though it says the section header size is corrupt, the binary
# is still executable.
$ file ./hello_world
./hello_world: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, corrupted section header size
$ ./hello_world
Hello World!

Notes

Only a few x64 instructions were required for a brainfuck program:

  • mov
  • inc
  • dec
  • add
  • sub
  • cmp
  • jne
  • int 0x80

so I only included the x64 encodings for these instructions, and only the 64-bit version of these instructions.

The elf executable is also generated programatically and will ouput an elf executable to disk. The executable is very minimal, it only includes the required elf header + 2 program headers, one for .text segment and one for .bss segment and then the encoded x64 instructions. The .text segment header contains information about the x64 code, and the .bss segment contains information about uninitialised data.

The resulting binary is quite small because it is missing all debug information usually produced by compilers and linkers.

The compiler isn't very smart, it doesn't attempt to do any constant folding or correct back-patching for uninitialized data. The elf binary will always have the .text section start from 0x400000 and the unitialised data section always starts from 0x600000. Since this is always the case we can "hardcode" the uninitialised data addresses.

x86-64 Instruction Encoding

REX (0-1 bytes) | Opcode (1-3 bytes) | MODR/M (0 -1 bytes) | SIB (0-1 bytes) | Displacement (0, 1, 2 or 4 bytes) | Immediate (0, 1, 2 or 4 bytes)

REX

The REX prefix is only available in long mode. An REX prefix must be encoded when:

  • using 64-bit operand size and the instruction does not default to 64-bit operand size
  • using one of the extended registers (R8 to R15, XMM8 to XMM15, YMM8 to YMM15, CR8 to CR15 and DR8 to DR15)
  • using one of the uniform byte registers SPL, BPL, SIL or DIL

A REX prefix must not be encoded when:

  • using one of the high byte registers AH, CH, BH or DH.

REX Encoding

| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| 0   1   0   0 | W | R | X | B |
  • 0100 is a 4 bit fixed bit pattern.
  • W (1 bit): 1 when a 64-bit operand size is used. Otherwise, 0 will use the default 32-bit operand size.
  • R (1 bit): Extension of MODRM.reg field.
  • X (1 bit): Extension of SIB.index field.
  • B (1 bit): Entension of MODRM.rm or SIB.base field.

MOD R/M

Used to encode up to two operands of an instruction, each of which is a direct register or effective memory address.

MOD R/M Encoding

| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  mod  |    reg    |    rm     |
  • MODRM.mod (2 bits):
    • 00 -> [rax]
    • 01 -> [rax + imm8], an immediate / constant 8 bit value
    • 10 -> [rax + imm32], an immediate / constant 32 bit value
    • 11 -> rax
  • MODRM.reg (3 bits):
    • Opcode extension: used by some instructions but has no further meaning other than distinguishing the instruction from other instructions.
    • Register reference: can be used as the source or destination of an instruction.
  • MODRM.rm (3 bits): Specifies a direct or indirect register operand, optionally with a displacement.

Elf Executable

The elf executable is consistent of 4 parts all layed out one after the other in the executable: elf header, text program header, bss program header and the raw x64 encodings.

The generated elf executable is currently missing debug information, so tools like gdb and objdump don't work on the resulting binaries. However, gdb can be told to work without the debug information present.

Currently, the elf generation always assumes the brainfuck program can fit into 0x200000 bytes of memory. If a proram goes over this I believe there will be segfaults and all other sorts of issues.

Program headers

These are used to indicate where abouts in the executable binary a certain sections starts, either the .text or .bss segment for this compiler. Most of the fields are pretty self explanatory, but the virtual address is the virtual address in memory you want linux to allocate you memory, this is usually started at 0x400000 by convention I think. The physical address apparently doesn't matter for my use case, but other elf executables I checked, like protoc and go have the physical address set to the virtual address. The number of bytes in the file image is how many bytes is in the segment, so for .text it will be the length of the encoded data, for the .bss segment it will be zero since nothing is added in the resulting binary executable for a .bss segment. And the number of bytes in the memory image is how much memory it will take up once the program is loaded; this will be the size requested by the program and will be filled out with zeroed out data when the program is loaded into memory. that segment takes up

Resources

About

brainfuck compiler that outputs an x86-64 elf executable for linux using no external dependencies

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published