From 51782dcff25fc497964c98951ea39a5517df1654 Mon Sep 17 00:00:00 2001
From: Zoltan Varga <vargaz@gmail.com>
Date: Fri, 22 Mar 2024 16:10:32 -0400
Subject: [PATCH] [mono] Update runtime docs. (#99699)

* [mono] Update wasm docs.

* Add AOT docs.

* Fix typos.

* Add documentation for wrappers.

* Fix typos.
---
 docs/design/mono/aot.md           | 144 ++++++++++++++++++++++++++++++
 docs/design/mono/runtime-ilgen.md | 110 +++++++++++++++++++++++
 docs/design/mono/wasm-aot.md      |  43 +++++++--
 3 files changed, 290 insertions(+), 7 deletions(-)
 create mode 100644 docs/design/mono/aot.md
 create mode 100644 docs/design/mono/runtime-ilgen.md
diff --git a/docs/design/mono/aot.md b/docs/design/mono/aot.md
new file mode 100644
index 0000000000000..07c5d416da702
--- /dev/null
+++ b/docs/design/mono/aot.md
@@ -0,0 +1,144 @@
+# Ahead of Time Compilation
+
+## Introduction
+
+The mono Ahead of Time (AOT) compiler enables the compilation of the IL code in a .NET assembly to
+a native object file. This file is called an AOT image. This AOT image can be used by the runtime to avoid
+having to JIT the IL code.
+
+## Usage
+
+The AOT compiler is integrated into the mono runtime executable, and can be run using the `--aot` command
+line argument, i.e.
+`<mono-executable> --aot HelloWorld.dll`
+
+## Source code structure
+
+- `aot-compiler.c`: The AOT compiler
+- `aot-runtime.c`: Code used at runtime to load AOT images
+- `image-writer.c`: Support code for emitting textual assembly
+- `dwarfwriter.c`: Support code for emitting DWARF debug info
+
+## Configurations
+
+### Desktop AOT
+
+In this mode, the AOT compiler creates a platform shared object file (.so/.dylib), i.e. `HelloWorld.dll.so`. During execution, when
+an assembly is loaded, the runtime loads the corresponding shared object and uses it to avoid having to AOT the methods in the
+assembly.
+
+Emission of the native code is done by first emitting an assembly (.s) file, then compiling and linking it with the system tools
+(`as`/`ld`, or `clang`).
+
+### Static AOT
+
+In this mode, the AOT compiler creates a platform object file (.o). This file needs to be linked into the application and registered
+with the runtime.
+
+Static compilation is enabled by using the `static` aot option, i.e. `--aot=static,...`. The resulting object file contains a linking
+symbol named `mono_aot_module_<assembly name>_info`. This symbol needs to be passed to the a runtime function before the
+runtime is initialized, i.e.:
+`mono_aot_register_module (mono_aot_module_HelloWorld_info);`
+
+### Full AOT
+
+In this mode, which can be combined with the other modes, the compiler generates additional code which enables the runtime to
+function without any code being generated at runtime. This includes 2 types of code:
+- code for 'extra' methods, i.e. generic instances, runtime generated wrappers methods, etc.
+- trampolines
+
+This is enabled by using `full` aot option, i.e. `--aot=full,...`. At runtime, all assemblies need to have a full-aot-ed AOT image
+present in order for the app to work. This is used on platforms which don't allow runtime code generation like IOS.
+
+### LLVM support
+
+LLVM support can be enabled using the `llvm` aot option, i.e. `--aot=llvm`. In this mode, instead of generating native code,
+the AOT compiler generates an LLVM bitcode (.bc), file, then compiles it to native code using the `opt`/`llc` LLVM tools. The
+various AOT data structures are also emitted into the .bc file instead of as assembly.
+Since the LLVM backend currently doesn't support all .net methods, a smaller assembly file is still emitted, and linked together
+with the `opt`/`llc` compiled object file into the final shared object file.
+
+## Versioning
+
+The generated AOT images have a dependency on the exact version input assembly used to generate them and the versions of all the
+referenced assemblies. This means the GUIDs of the assemblies have to match. If there is a mismatch, the AOT image will fail to load.
+
+## File structure
+
+The AOT images exports one symbol named `mono_aot_module_<assembly name>_info` which points to a `MonoAotFileInfo` structure,
+which contains pointers to the tables/structures. The AOT image contains:
+- the native code
+- data structures required to load the code
+- cached data intended to speed up runtime operation
+
+The AOT image contains serialized versions of many .NET objects like methods/types etc. This uses ad-hoc binary encodings.
+
+## Runtime support
+
+The `aot-runtime.c` file contains the runtime support for loading AOT images.
+
+### Loading AOT images
+
+When an assembly is loaded, the corresponding AOT images is either loaded using the system dynamic linker (`dlopen`), or
+found among the statically linked AOT images.
+
+### Loading methods
+
+Every method in the AOT image is assigned an index. The AOT methods corresponding to 'normal' .NET methods are assigned
+an index corresponding to their metadata token index, while the 'extra' methods are assigned subsequent indexes. There is
+a hash table inside the AOT image mapping extra methods to their AOT indexes. Loading a method consists of
+- finding its method index
+- finding the method code/data corresponding to the method index
+
+The mapping from method index to the code is done in an architecture specific way, designed to minimize the amount of
+runtime relocations in the AOT image. In some cases, this involves generating an extra table with assembly call instructions to
+all the methods, then disassembling this table at runtime.
+
+
+
+### Runtime constants
+
+The generated code needs to access data which is only available at runtime. For example, for an `ldstr "Hello"` instruction, the
+`"Hello"` string is a runtime constant.
+
+These constants are stored in a global table called the GOT which is modelled after the Global Offset Table in ELF images. The GOT
+table contains pointers to runtime objects. The AOT image contains descriptions of these runtime objects so the AOT runtime can
+compute them. The entries in the GOT are initialized either when the AOT image is loaded (for frequently used entries), or before
+the method which uses them is first executed.
+
+### Initializing methods
+
+Before an AOTed method can be executed, it might need some initialization. This involves:
+- executing its class cctor
+- initializing the GOT slots used by the method
+
+For methods compiled by the mono JIT, initialization is done when the method is loaded. This means that its not possible to
+have direct calls between methods. Instead, calls between methods go through small pieces of generated code called PLT
+(Program Linkage Table) entries, which transfer control to the runtime which loads the called method before executing it.
+For methods compiled by LLVM, the method entry contains a call to the runtime which initializes the method.
+
+## Trampolines
+
+In full-aot mode, the AOT compiler needs to emit all the trampolines which will be used at runtime. This is done in
+the following way:
+- For most trampolines, the AOT compiler calls the normal trampoline creation function with the `aot` argument set
+to TRUE, then saves the returned native code into the AOT image, along with some relocation information like the
+GOT slots used by the trampolines.
+- For some small trampolines, the AOT compiler directly emits platform specific assembly.
+
+The runtime might require an unbounded number of certain trampolines, but the AOT image can only contain a fixed
+number of them. To solve this problem, on some platforms (IOS), its possible to have infinite trampolines. This is
+implemented by emitting a different version of these trampolines which reference their corresponding data using
+relative addressing. At runtime, a page of these trampolines is mapped using `mmap` next to a writable page
+which contains their corresponding data. The same page of trampolines is mapped multiple times at multiple
+addresses.
+
+## Cross compilation
+
+Its possible to use the AOT compiler to target a platform different than the host. This requires a separate cross compiler
+build of the runtime.
+The generated code depends on offsets inside runtime structures like `MonoClass`/`MonoVTable` etc. which could
+differ between the host and the target. This is handled by having a tool called the offsets-tool, which is a python
+script which uses the clang python interface to compute and emit a C header file containing these offsets. The header
+file is passed as a cmake argument during the runtime build. Inside the runtime code, the `MONO_STRUCT_OFFSET`
+C macro reads the data from the offsets file to produce the offset corresponding to the target platform.
diff --git a/docs/design/mono/runtime-ilgen.md b/docs/design/mono/runtime-ilgen.md
new file mode 100644
index 0000000000000..8c17bb697a2ac
--- /dev/null
+++ b/docs/design/mono/runtime-ilgen.md
@@ -0,0 +1,110 @@
+# IL generation at runtime
+
+## Introduction
+
+The mono runtime makes extensive use of generating IL methods at runtime. These
+methods are called 'wrappers' in the runtime code, because some of them 'wrap' other
+methods, like a managed-to-native wrapper would wrap the native function being called.
+Wrappers have the `MonoMethod.wrapper_type` field set to the type of the wrapper.
+
+## Source code structure
+
+- `wrapper-types.h`: Enumeration of wrapper types
+- `marshal*`: Functions for generating wrappers
+- `method-builder*`: Low level functions for creating new IL methods/code at runtime
+
+## WrapperInfo
+
+Every wrapper has an associated `WrapperInfo` structure which describes the wrapper.
+This can be retrieved using the `mono_marshal_get_wrapper_info ()` function.
+Some wrappers have subtypes, these are stored in `WrapperInfo.subtype`.
+
+## Caching wrappers
+
+Wrappers should be unique, i.e. there should be only one instance of every wrapper. This is
+achieved by caching wrappers in wrapper type specific hash tables, which are stored in
+`MonoMemoryManager.wrapper_caches`.
+
+## Generics and wrappers
+
+Wrappers for generic instances should be created by doing:
+instance method -> generic method definition -> generic wrapper -> inflated wrapper
+
+## AOT support
+
+In full-aot mode, the AOT compiler will collect and emit the wrappers needed by the
+application at runtime. This involves serializing/deserializing the `WrapperInfo` structure.
+
+## Wrapper types
+
+### Managed-to-native
+
+These wrappers are used to make calls to native code. They are responsible for marshalling
+arguments and result values, setting up EH structures etc.
+
+### Native-to-managed
+
+These wrappers are used to call managed methods from native code. When a delegate is passed to
+native code, the native code receives a native-to-managed wrapper.
+
+### Delegate-invoke
+
+Used to handle more complicated cases of delegate invocation that the fastpaths in the JIT can't handle.
+
+### Synchronized
+
+Used to wrap synchronized methods. The wrapper does the locking.
+
+### Runtime-invoke
+
+Used to implement `mono_runtime_invoke ()`.
+
+### Dynamic-method
+
+These are not really wrappers, but methods created by user code using the `DynamicMethod` class.
+
+Note that these have no associated `WrapperInfo` structure.
+
+### Alloc
+
+SGEN allocator methods.
+
+### Write-barrier
+
+SGEN write barrier methods.
+
+### Castclass
+
+Used to implement complex casts.
+
+### Stelemref
+
+Used to implement stelem.ref.
+
+### Unbox
+
+Used to unbox the receiver before calling a method.
+
+### Managed-to-managed/other
+
+The rest of the wrappers, distinguished by their subtype.
+
+#### String-ctor
+
+Used to implement string ctors, the first argument is ignored, and a new string is allocated.
+
+#### Element-addr
+
+Used to implement ldelema in multi-dimensional arrays.
+
+#### Generic-array-helper
+
+Used to implement the implicit interfaces on arrays like IList<T> etc. Delegate to helper methods on the Array class.
+
+#### Structure-to-ptr
+
+Used to implement Marshal.StructureToPtr.
+
+#### Ptr-to-structure
+
+Used to implement Marshal.PtrToStructure.
diff --git a/docs/design/mono/wasm-aot.md b/docs/design/mono/wasm-aot.md
index ef907bfe0abe7..20f900e35f47f 100644
--- a/docs/design/mono/wasm-aot.md
+++ b/docs/design/mono/wasm-aot.md
@@ -6,15 +6,29 @@ The LLVM backend of the Mono JIT is used to generate an llvm .bc file for each a
 compiled to webassembly using emscripten, then the resulting wasm files are linked into the final app. The 'bitcode'/'llvmonly'
 variant of the LLVM backend is used since webassembly doesn't support inline assembly etc.
 
+## Source code structure
+
+`mini-llvm.c`: The LLVM backend.
+`mini-wasm.h/c`: The wasm backend. This is a minimal version of a normal mono JIT backend which only supports llvm.
+`llvm-runtime.cpp`: Code to throw/catch C++ exceptions.
+`aot-runtime-wasm.c`: Code related to interpreter/native transitions on wasm.
+`llvmonly-runtime.c`:  Runtime support for the generated AOT code.
+
+WASM specific code is inside `HOST_WASM/TARGET_WASM` defines.
+
 ## GC Support
 
 On wasm, the execution stack is not stored in linear memory, so its not possible to scan it for GC references. However, there
-is an additional C stack which stores variables whose addresses are taken. Variables which hold GC references are marked as
-'volatile' in the llvm backend, forcing llvm to spill those to the C stack so they can be scanned.
+is an additional C stack in linear memory which is managed explicitly by the generated wasm code. This stack is already
+scanned by the mono GC as on other platforms.
+To make GC references in AOTed methods visible to the GC, every method allocates a gc_pin area in its prolog, and
+stores arguments/locals with a reference type into it. This will cause the GC to pin those references so the rest of
+the generated code can treat them normally as LLVM values.
 
 ## Interpreter support
 
-Its possible for AOTed and interpreted code to interop, this is called mixed mode.
+On wasm, the two supported execution modes are interpreter, or aot+interpreter. This means its always
+possible to fall back to the interpreter if needed.
 For the AOT -> interpreter case, every call from AOTed code which might end up in the interpreter is
 emitted as an indirect call. When the callee is not found, a wrapper function is used which
 packages up the arguments into an array and passes control to the interpreter.
@@ -24,6 +38,22 @@ AOTed code. There is usually one aot->interp and interp->aot wrapper for each si
 some sharing. These wrappers are generated by the AOT compiler when the 'interp' aot option
 is used.
 
+## Exception handling
+
+On wasm, its not possible to walk the stack so the normal mono exception handling/unwind code
+cannot be used as is. Its also hard to map the .NET exception handling concepts like filter clauses
+to the llvm concepts. Instead, c++/wasm exceptions are used to implement unwinding, and the
+interpreter is used to execute EH code.
+When an exception needs to be thrown, we store the exception info in TLS, and throw a dummy C++ exception instead.
+Internally, this is implemented by emscripten either by calling into JS, or by using the wasm exception handling
+spec.
+The c++ exception is caught in the generated AOT code using the relevant llvm catch instructions. Then execution is
+transferred to the interpreter. This is done by creating a data structure on the stack containing all the IL level state like
+the IL offset and the values of all the IL level variables. The generated code continuously updates this state during
+execution. When an exception is caught, this IL state is passed to the interpreter which continues execution from
+that point.  This process is called `deopt` in the runtime code.
+Exceptions are also caught in various other places like the interpreter-aot boundary.
+
 ## Null checks
 
 Since wasm has no signal support, we generate explicit null checks.
@@ -59,8 +89,7 @@ if (vt_entry == null)
 	vt_entry = init_vt_entry ();
 ```
 
-### GC overhead
+### Exception handling
 
-Since GC variables are marked as volatile and stored on the C stack, they are loaded/stored on every access,
-even if there is no GC safe point between the accesses. Instead, they should only be loaded/stored around
-GC safe points.
+It might be possible to implement EH in the generated code without involving the interpreter. The
+current design adds a lot of overhead to methods which contain IL clauses.