From 51782dcff25fc497964c98951ea39a5517df1654 Mon Sep 17 00:00:00 2001 From: Zoltan Varga Date: Fri, 22 Mar 2024 16:10:32 -0400 Subject: [PATCH] [mono] Update runtime docs. (#99699) * [mono] Update wasm docs. * Add AOT docs. * Fix typos. * Add documentation for wrappers. * Fix typos. --- docs/design/mono/aot.md | 144 ++++++++++++++++++++++++++++++ docs/design/mono/runtime-ilgen.md | 110 +++++++++++++++++++++++ docs/design/mono/wasm-aot.md | 43 +++++++-- 3 files changed, 290 insertions(+), 7 deletions(-) create mode 100644 docs/design/mono/aot.md create mode 100644 docs/design/mono/runtime-ilgen.md diff --git a/docs/design/mono/aot.md b/docs/design/mono/aot.md new file mode 100644 index 0000000000000..07c5d416da702 --- /dev/null +++ b/docs/design/mono/aot.md @@ -0,0 +1,144 @@ +# Ahead of Time Compilation + +## Introduction + +The mono Ahead of Time (AOT) compiler enables the compilation of the IL code in a .NET assembly to +a native object file. This file is called an AOT image. This AOT image can be used by the runtime to avoid +having to JIT the IL code. + +## Usage + +The AOT compiler is integrated into the mono runtime executable, and can be run using the `--aot` command +line argument, i.e. +` --aot HelloWorld.dll` + +## Source code structure + +- `aot-compiler.c`: The AOT compiler +- `aot-runtime.c`: Code used at runtime to load AOT images +- `image-writer.c`: Support code for emitting textual assembly +- `dwarfwriter.c`: Support code for emitting DWARF debug info + +## Configurations + +### Desktop AOT + +In this mode, the AOT compiler creates a platform shared object file (.so/.dylib), i.e. `HelloWorld.dll.so`. During execution, when +an assembly is loaded, the runtime loads the corresponding shared object and uses it to avoid having to AOT the methods in the +assembly. + +Emission of the native code is done by first emitting an assembly (.s) file, then compiling and linking it with the system tools +(`as`/`ld`, or `clang`). + +### Static AOT + +In this mode, the AOT compiler creates a platform object file (.o). This file needs to be linked into the application and registered +with the runtime. + +Static compilation is enabled by using the `static` aot option, i.e. `--aot=static,...`. The resulting object file contains a linking +symbol named `mono_aot_module__info`. This symbol needs to be passed to the a runtime function before the +runtime is initialized, i.e.: +`mono_aot_register_module (mono_aot_module_HelloWorld_info);` + +### Full AOT + +In this mode, which can be combined with the other modes, the compiler generates additional code which enables the runtime to +function without any code being generated at runtime. This includes 2 types of code: +- code for 'extra' methods, i.e. generic instances, runtime generated wrappers methods, etc. +- trampolines + +This is enabled by using `full` aot option, i.e. `--aot=full,...`. At runtime, all assemblies need to have a full-aot-ed AOT image +present in order for the app to work. This is used on platforms which don't allow runtime code generation like IOS. + +### LLVM support + +LLVM support can be enabled using the `llvm` aot option, i.e. `--aot=llvm`. In this mode, instead of generating native code, +the AOT compiler generates an LLVM bitcode (.bc), file, then compiles it to native code using the `opt`/`llc` LLVM tools. The +various AOT data structures are also emitted into the .bc file instead of as assembly. +Since the LLVM backend currently doesn't support all .net methods, a smaller assembly file is still emitted, and linked together +with the `opt`/`llc` compiled object file into the final shared object file. + +## Versioning + +The generated AOT images have a dependency on the exact version input assembly used to generate them and the versions of all the +referenced assemblies. This means the GUIDs of the assemblies have to match. If there is a mismatch, the AOT image will fail to load. + +## File structure + +The AOT images exports one symbol named `mono_aot_module__info` which points to a `MonoAotFileInfo` structure, +which contains pointers to the tables/structures. The AOT image contains: +- the native code +- data structures required to load the code +- cached data intended to speed up runtime operation + +The AOT image contains serialized versions of many .NET objects like methods/types etc. This uses ad-hoc binary encodings. + +## Runtime support + +The `aot-runtime.c` file contains the runtime support for loading AOT images. + +### Loading AOT images + +When an assembly is loaded, the corresponding AOT images is either loaded using the system dynamic linker (`dlopen`), or +found among the statically linked AOT images. + +### Loading methods + +Every method in the AOT image is assigned an index. The AOT methods corresponding to 'normal' .NET methods are assigned +an index corresponding to their metadata token index, while the 'extra' methods are assigned subsequent indexes. There is +a hash table inside the AOT image mapping extra methods to their AOT indexes. Loading a method consists of +- finding its method index +- finding the method code/data corresponding to the method index + +The mapping from method index to the code is done in an architecture specific way, designed to minimize the amount of +runtime relocations in the AOT image. In some cases, this involves generating an extra table with assembly call instructions to +all the methods, then disassembling this table at runtime. + + + +### Runtime constants + +The generated code needs to access data which is only available at runtime. For example, for an `ldstr "Hello"` instruction, the +`"Hello"` string is a runtime constant. + +These constants are stored in a global table called the GOT which is modelled after the Global Offset Table in ELF images. The GOT +table contains pointers to runtime objects. The AOT image contains descriptions of these runtime objects so the AOT runtime can +compute them. The entries in the GOT are initialized either when the AOT image is loaded (for frequently used entries), or before +the method which uses them is first executed. + +### Initializing methods + +Before an AOTed method can be executed, it might need some initialization. This involves: +- executing its class cctor +- initializing the GOT slots used by the method + +For methods compiled by the mono JIT, initialization is done when the method is loaded. This means that its not possible to +have direct calls between methods. Instead, calls between methods go through small pieces of generated code called PLT +(Program Linkage Table) entries, which transfer control to the runtime which loads the called method before executing it. +For methods compiled by LLVM, the method entry contains a call to the runtime which initializes the method. + +## Trampolines + +In full-aot mode, the AOT compiler needs to emit all the trampolines which will be used at runtime. This is done in +the following way: +- For most trampolines, the AOT compiler calls the normal trampoline creation function with the `aot` argument set +to TRUE, then saves the returned native code into the AOT image, along with some relocation information like the +GOT slots used by the trampolines. +- For some small trampolines, the AOT compiler directly emits platform specific assembly. + +The runtime might require an unbounded number of certain trampolines, but the AOT image can only contain a fixed +number of them. To solve this problem, on some platforms (IOS), its possible to have infinite trampolines. This is +implemented by emitting a different version of these trampolines which reference their corresponding data using +relative addressing. At runtime, a page of these trampolines is mapped using `mmap` next to a writable page +which contains their corresponding data. The same page of trampolines is mapped multiple times at multiple +addresses. + +## Cross compilation + +Its possible to use the AOT compiler to target a platform different than the host. This requires a separate cross compiler +build of the runtime. +The generated code depends on offsets inside runtime structures like `MonoClass`/`MonoVTable` etc. which could +differ between the host and the target. This is handled by having a tool called the offsets-tool, which is a python +script which uses the clang python interface to compute and emit a C header file containing these offsets. The header +file is passed as a cmake argument during the runtime build. Inside the runtime code, the `MONO_STRUCT_OFFSET` +C macro reads the data from the offsets file to produce the offset corresponding to the target platform. diff --git a/docs/design/mono/runtime-ilgen.md b/docs/design/mono/runtime-ilgen.md new file mode 100644 index 0000000000000..8c17bb697a2ac --- /dev/null +++ b/docs/design/mono/runtime-ilgen.md @@ -0,0 +1,110 @@ +# IL generation at runtime + +## Introduction + +The mono runtime makes extensive use of generating IL methods at runtime. These +methods are called 'wrappers' in the runtime code, because some of them 'wrap' other +methods, like a managed-to-native wrapper would wrap the native function being called. +Wrappers have the `MonoMethod.wrapper_type` field set to the type of the wrapper. + +## Source code structure + +- `wrapper-types.h`: Enumeration of wrapper types +- `marshal*`: Functions for generating wrappers +- `method-builder*`: Low level functions for creating new IL methods/code at runtime + +## WrapperInfo + +Every wrapper has an associated `WrapperInfo` structure which describes the wrapper. +This can be retrieved using the `mono_marshal_get_wrapper_info ()` function. +Some wrappers have subtypes, these are stored in `WrapperInfo.subtype`. + +## Caching wrappers + +Wrappers should be unique, i.e. there should be only one instance of every wrapper. This is +achieved by caching wrappers in wrapper type specific hash tables, which are stored in +`MonoMemoryManager.wrapper_caches`. + +## Generics and wrappers + +Wrappers for generic instances should be created by doing: +instance method -> generic method definition -> generic wrapper -> inflated wrapper + +## AOT support + +In full-aot mode, the AOT compiler will collect and emit the wrappers needed by the +application at runtime. This involves serializing/deserializing the `WrapperInfo` structure. + +## Wrapper types + +### Managed-to-native + +These wrappers are used to make calls to native code. They are responsible for marshalling +arguments and result values, setting up EH structures etc. + +### Native-to-managed + +These wrappers are used to call managed methods from native code. When a delegate is passed to +native code, the native code receives a native-to-managed wrapper. + +### Delegate-invoke + +Used to handle more complicated cases of delegate invocation that the fastpaths in the JIT can't handle. + +### Synchronized + +Used to wrap synchronized methods. The wrapper does the locking. + +### Runtime-invoke + +Used to implement `mono_runtime_invoke ()`. + +### Dynamic-method + +These are not really wrappers, but methods created by user code using the `DynamicMethod` class. + +Note that these have no associated `WrapperInfo` structure. + +### Alloc + +SGEN allocator methods. + +### Write-barrier + +SGEN write barrier methods. + +### Castclass + +Used to implement complex casts. + +### Stelemref + +Used to implement stelem.ref. + +### Unbox + +Used to unbox the receiver before calling a method. + +### Managed-to-managed/other + +The rest of the wrappers, distinguished by their subtype. + +#### String-ctor + +Used to implement string ctors, the first argument is ignored, and a new string is allocated. + +#### Element-addr + +Used to implement ldelema in multi-dimensional arrays. + +#### Generic-array-helper + +Used to implement the implicit interfaces on arrays like IList etc. Delegate to helper methods on the Array class. + +#### Structure-to-ptr + +Used to implement Marshal.StructureToPtr. + +#### Ptr-to-structure + +Used to implement Marshal.PtrToStructure. diff --git a/docs/design/mono/wasm-aot.md b/docs/design/mono/wasm-aot.md index ef907bfe0abe7..20f900e35f47f 100644 --- a/docs/design/mono/wasm-aot.md +++ b/docs/design/mono/wasm-aot.md @@ -6,15 +6,29 @@ The LLVM backend of the Mono JIT is used to generate an llvm .bc file for each a compiled to webassembly using emscripten, then the resulting wasm files are linked into the final app. The 'bitcode'/'llvmonly' variant of the LLVM backend is used since webassembly doesn't support inline assembly etc. +## Source code structure + +`mini-llvm.c`: The LLVM backend. +`mini-wasm.h/c`: The wasm backend. This is a minimal version of a normal mono JIT backend which only supports llvm. +`llvm-runtime.cpp`: Code to throw/catch C++ exceptions. +`aot-runtime-wasm.c`: Code related to interpreter/native transitions on wasm. +`llvmonly-runtime.c`: Runtime support for the generated AOT code. + +WASM specific code is inside `HOST_WASM/TARGET_WASM` defines. + ## GC Support On wasm, the execution stack is not stored in linear memory, so its not possible to scan it for GC references. However, there -is an additional C stack which stores variables whose addresses are taken. Variables which hold GC references are marked as -'volatile' in the llvm backend, forcing llvm to spill those to the C stack so they can be scanned. +is an additional C stack in linear memory which is managed explicitly by the generated wasm code. This stack is already +scanned by the mono GC as on other platforms. +To make GC references in AOTed methods visible to the GC, every method allocates a gc_pin area in its prolog, and +stores arguments/locals with a reference type into it. This will cause the GC to pin those references so the rest of +the generated code can treat them normally as LLVM values. ## Interpreter support -Its possible for AOTed and interpreted code to interop, this is called mixed mode. +On wasm, the two supported execution modes are interpreter, or aot+interpreter. This means its always +possible to fall back to the interpreter if needed. For the AOT -> interpreter case, every call from AOTed code which might end up in the interpreter is emitted as an indirect call. When the callee is not found, a wrapper function is used which packages up the arguments into an array and passes control to the interpreter. @@ -24,6 +38,22 @@ AOTed code. There is usually one aot->interp and interp->aot wrapper for each si some sharing. These wrappers are generated by the AOT compiler when the 'interp' aot option is used. +## Exception handling + +On wasm, its not possible to walk the stack so the normal mono exception handling/unwind code +cannot be used as is. Its also hard to map the .NET exception handling concepts like filter clauses +to the llvm concepts. Instead, c++/wasm exceptions are used to implement unwinding, and the +interpreter is used to execute EH code. +When an exception needs to be thrown, we store the exception info in TLS, and throw a dummy C++ exception instead. +Internally, this is implemented by emscripten either by calling into JS, or by using the wasm exception handling +spec. +The c++ exception is caught in the generated AOT code using the relevant llvm catch instructions. Then execution is +transferred to the interpreter. This is done by creating a data structure on the stack containing all the IL level state like +the IL offset and the values of all the IL level variables. The generated code continuously updates this state during +execution. When an exception is caught, this IL state is passed to the interpreter which continues execution from +that point. This process is called `deopt` in the runtime code. +Exceptions are also caught in various other places like the interpreter-aot boundary. + ## Null checks Since wasm has no signal support, we generate explicit null checks. @@ -59,8 +89,7 @@ if (vt_entry == null) vt_entry = init_vt_entry (); ``` -### GC overhead +### Exception handling -Since GC variables are marked as volatile and stored on the C stack, they are loaded/stored on every access, -even if there is no GC safe point between the accesses. Instead, they should only be loaded/stored around -GC safe points. +It might be possible to implement EH in the generated code without involving the interpreter. The +current design adds a lot of overhead to methods which contain IL clauses.