diff --git a/docs/design/mono/web/README.md b/docs/design/mono/web/README.md new file mode 100644 index 0000000000000..50992e0a68b3f --- /dev/null +++ b/docs/design/mono/web/README.md @@ -0,0 +1 @@ +This directory contains the original mono runtime documentation from the [mono website](https://github.com/mono/website/tree/gh-pages/docs/advanced/runtime/docs). diff --git a/docs/design/mono/web/aot.md b/docs/design/mono/web/aot.md new file mode 100644 index 0000000000000..ffa14737f3ee3 --- /dev/null +++ b/docs/design/mono/web/aot.md @@ -0,0 +1,179 @@ +# Ahead of Time Compilation (AOT) + +Mono Ahead Of Time Compiler +--------------------------- + +The Ahead of Time compilation feature in Mono allows Mono to precompile assemblies to minimize JIT time, reduce memory usage at runtime and increase the code sharing across multiple running Mono application. + +To precompile an assembly use the following command: + + mono --aot -O=all assembly.exe + +The \`--aot' flag instructs Mono to ahead-of-time compile your assembly, while the -O=all flag instructs Mono to use all the available optimizations. + +Besides code, the AOT file also contains cached metadata information which allows the runtime to avoid certain computations at runtime, like the computation of generic vtables. This reduces both startup time, and memory usage. It is possible to create an AOT image which contains only this cached information and no code by using the 'metadata-only' option during compilation: + + mono --aot=metadata-only assembly.exe + +This works even on platforms where AOT is not normally supported. + +The code generated by Ahead-of-Time compiled images is position-independent code. This allows the same precompiled image to be reused across multiple applications without having different copies: this is the same way in which ELF shared libraries work: the code produced can be relocated to any address. + +The implementation of Position Independent Code has a performance impact on Ahead-of-Time compiled images but compiler bootstraps are still faster than JIT-compiled images, specially with all the new optimizations provided by the Mono engine. + +### The AOT File Format + +We use the native object format of the platform. That way it is possible to reuse existing tools like as/ld and the dynamic loader. On ELF platforms, the AOT compiler can generate an ELF .so file directly, on other platforms, it generates an assembly (.s) file which is then assembled and linked by as/ld into a shared library. + +The precompiled image is stored in a file next to the original assembly that is precompiled with the native extension for a shared library (on Linux its ".so" to the generated file). + +For example: basic.exe -\> basic.exe.so; corlib.dll -\> corlib.dll.so + +There is one global symbol in each AOT image named 'mono_aot_file_info'. This points to a MonoAotFileInfo structure which contains pointers to all the AOT data structures. In the latter parts of this document, fields of this structure are referenced using info-\>\. + +Binary data other than code is stored in one giant blob. Data items inside the blob can be found using several tables called 'XXX_offsets', like 'method_info_offsets'. These tables contain offsets into the blob, stored in a compact format using differential encoding plus an index. + +### Source file structure + +The AOT infrastructure is split into two files, aot-compiler.c and aot-runtime.c. aot-compiler.c contains the AOT compiler which is invoked by --aot, while aot-runtime.c contains the runtime support needed for loading code and other things from the aot files. The file image-writer.c contains the ELF writer/ASM writer code. + +### Compilation process + +AOT compilation consists of the following stages: + +- collecting the methods to be compiled. +- compiling them using the JIT. +- emitting the JITted code and other information +- emitting the output file either directly, or by executing the system assembler/linker. + +### Handling methods + +There are two kinds of methods handled by AOT: + +- Normal methods are methods from the METHODDEF table. +- 'Extra' methods are either runtime generated methods (wrappers) or methods of inflated generic classes/inflated generic methods. + +Each method is identified by a method index. For normal methods, this is equivalent to its index in the METHOD metadata table. For extra methods, it is an arbitrary number. Compiled code is created by invoking the JIT, requesting it to created AOT code instead of normal code. This is done by the compile_method () function. The output of the JIT is compiled code and a set of patches (relocations). Each relocation specifies an offset inside the compiled code, and a runtime object whose address is accessed at that offset. Patches are described by a MonoJumpInfo structure. From the perspective of the AOT compiler, there are two kinds of patches: + +- calls, which require an entry in the PLT table. +- everything else, which require an entry in the GOT table. + +How patches is handled is described in the next section. After all the method are compiled, they are emitted into the output file into a byte array called 'methods'. Each piece of compiled code is identified by the local symbol .Lm_\. While compiled code is emitted, all the locations which have an associated patch are rewritten using a platform specific process so the final generated code will refer to the plt and got entries belonging to the patches. This is done by the emit_and_reloc_code () function. The compiled code array can be accessed using the 'methods' global symbol. + +### Handling patches + +Before a piece of AOTed code can be used, the GOT entries used by it must be filled out with the addresses of runtime objects. Those objects are identified by MonoJumpInfo structures. These stuctures are saved in a serialized form in the AOT file, so the AOT loader can deconstruct them. The serialization is done by the encode_patch () function, while the deserialization is done by the decode_patch_info () function. Every method has an associated method info blob stored inside the global blob. This contains all the information required to load the method at runtime: + +- the first got entry used by the method. +- the number of got entries used by the method. +- the indexes of the got entries used by the method + +Each GOT entry is described by a serialized description stored in the global blob. The 'got_info_offsets' table maps got offsets to the offsets of their description. + +### The Procedure Linkage Table (PLT) + +Our PLT is similar to the elf PLT, it is used to handle calls between methods. If method A needs to call method B, then an entry is allocated in the PLT for method B, and A calls that entry instead of B directly. This is useful because in some cases the runtime needs to do some processing the first time B is called. The processing includes: + +- if B is in another assembly, then it needs to be looked up, then JITted or the corresponding AOT code needs to be found. +- if B is in the same assembly, but has got slots, then the got slots need to be initialized. + +If none of these cases is true, then the PLT is not used, and the call is made directly to the native code of the target method. A PLT entry is usually implemented by a jump through a GOT entry, these entries are initially filled up with the address of a trampoline so the runtime can get control, and after the native code of the called method is created/found, the jump table entry is changed to point to the native code. All PLT entries also embed a integer offset after the jump which indexes into the 'plt_info' table, which stores the information required to find the called method. The PLT is emitted by the emit_plt () function. + +### Exception/Debug info + +Each compiled method has some additional info generated by the JIT, usable for debugging (IL offset-native offset maps) and exception handling (saved registers, native offsets of try/catch clauses). These are stored in the blob, and the 'ex_info_offsets' table can be used to find them. + +### Cached metadata + +When the runtime loads a class, it needs to compute a variety of information which is not readily available in the metadata, like the instance size, vtable, whenever the class has a finalizer/type initializer etc. Computing this information requires a lot of time, causes the loading of lots of metadata, and it usually involves the creation of many runtime data structures (MonoMethod/MonoMethodSignature etc), which are long living, and usually persist for the lifetime of the app. To avoid this, we compute the required information at aot compilation time, and save it into the aot image, into an array called 'class_info'. The runtime can query this information using the mono_aot_get_cached_class_info () function, and if the information is available, it can avoid computing it. To speed up mono_class_from_name (), a hash table mapping class names to class indexes is constructed and saved in the AOT file pointed to by the symbol 'class_name_table'. + +### Other data + +Things saved into the AOT file which are not covered elsewhere: + +- info-\>assembly_guid A copy of the assembly GUID. When loading an AOT image, this GUID must match with the GUID of the assembly for the AOT image to be usable. + +- info-\>version The version of the AOT file format. This is checked against the MONO_AOT_FILE_VERSION constant in mini.h before an AOT image is loaded. The version number must be incremented when an incompatible change is made to the AOT file format. + +- info-\>image_table A list of assemblies referenced by this AOT module. + +- info-\>plt The Program Linkage Table + +### LLVM Support + +It is possible to use LLVM in AOT mode. This is implemented by compiling methods using LLVM instead of the JIT, saving the resulting LLVM bytecode into an LLVM .bc file, compiling it using LLVM tools into a .s file, then appending our own AOT data structures to that file. + +### Full AOT mode + +Some platforms like the iphone prohibit JITted code, using technical and/or legal means. This is a significant problem for the mono runtime, since it generates a lot of code dynamically, using either the JIT or more low-level code generation macros. To solve this, the AOT compiler is able to function in full-aot or aot-only mode, where it generates and saves all the neccesary code in the aot image, so at runtime, no code needs to be generated. There are two kinds of code which needs to be considered: + +- wrapper methods, that is methods whose IL is generated dynamically by the runtime. They are handled by generating them in the add_wrappers () function, then emitting them as 'extra' methods. +- trampolines and other small hand generated pieces of code. They are handled in an ad-hoc way in the emit_trampolines () function. + +### Emitting assembly/object code + +The output emission functionality is in the file image-writer.c. It can either emit assembly code (.s), or it can produce a shared image directly. The latter is only supported on x86/amd64 ELF. The emission of debug information is in the file dwarfwriter.c. + +### Performance considerations + +Using AOT code is a trade-off which might lead to higher or slower performance, depending on a lot of circumstances. Some of these are: + +- AOT code needs to be loaded from disk before being used, so cold startup of an application using AOT code MIGHT be slower than using JITed code. Warm startup (when the code is already in the machines cache) should be faster. Also, JITing code takes time, and the JIT compiler also need to load additional metadata for the method from the disk, so startup can be faster even in the cold startup case. +- AOT code is usually compiled with all optimizations turned on, while JITted code is usually compiled with default optimizations, so the generated code in the AOT case could be faster. +- JITted code can directly access runtime data structures and helper functions, while AOT code needs to go through an indirection (the GOT) to access them, so it will be slower and somewhat bigger as well. +- When JITting code, the JIT compiler needs to load a lot of metadata about methods and types into memory. +- JITted code has better locality, meaning that if A method calls B, then the native code for A and B is usually quite close in memory, leading to better cache behavior thus improved performance. In contrast, the native code of methods inside the AOT file is in a somewhat random order. + +### Porting + +Generated native code needs to reference various runtime structures/functions whose address is only known at run time. JITted code can simple embed the address into the native code, but AOT code needs to do an indirection. This indirection is done through a table called the Global Offset Table (GOT), which is similar to the GOT table in the Elf spec. When the runtime saves the AOT image, it saves some information for each method describing the GOT table entries used by that method. When loading a method from an AOT image, the runtime will fill out the GOT entries needed by the method. + +#### Computing the address of the GOT + +Methods which need to access the GOT first need to compute its address. On the x86 it is done by code like this: + + call + pop ebx + add , ebx + + +The variable representing the got is stored in cfg-\>got_var. It is allways allocated to a global register to prevent some problems with branches + basic blocks. + +#### Referencing GOT entries + +Any time the native code needs to access some other runtime structure/function (i.e. any time the backend calls mono_add_patch_info ()), the code pointed by the patch needs to load the value from the got. For example, instead of: + + call + +it needs to do: + + call *() + +Here, the \ can be 0, it will be fixed up by the AOT compiler. + +For more examples on the changes required, see + +svn diff -r 37739:38213 mini-x86.c + +### Back end functionality + +#### OP_AOTCONST + +Loading informarion from the GOT tables is done by the OP_AOTCONST opcode. Since the opcode implementation needs to reference the GOT symbol, which is not available during JITting, the backend should emit some placeholder code in mono_arch_output_basic_block (), and emit the real implementation in arch_emit_got_access () in aot-compiler.c. + +#### Constants + +AOTed code cannot contain literal constants like addresses etc. All occurences of those should be replaced by an OP_AOTCONST. + +#### PLT Entries + +PLT entries are emitted by arch_emit_plt_entry () in aot-compiler.c. Each PLT entry has a corresponding slot in the GOT. The PLT entry should load this GOT slot, and branch to it, without clobbering any argument registers or the return value. Since the return address is not updated, the AOT code obtains the address of the PLT entry by disassembling the call site which branched to the PLT entry. This is done by the mono_arch_get_call_target () function in tramp-\.c. The information needed to resolve the target of the PLT entry is in the AOT tables, and an offset into these tables should be emitted as a word after the PLT entry. The mono_arch_get_plt_info_offset () function in tramp-\.c is responsible for retrieving this offset. After the call is resolved, the GOT slot used by the PLT entry needs to be updated with the new address. This is done by the mono_arch_patch_plt_entry () function in tramp-\.c. + +### Future Work + +- Currently, when an AOT module is loaded, all of its dependent assemblies are also loaded eagerly, and these assemblies need to be exactly the same as the ones loaded when the AOT module was created ('hard binding'). Non-hard binding should be allowed. +- On x86, the generated code uses call 0, pop REG, add GOTOFFSET, REG to materialize the GOT address. Newer versions of gcc use a separate function to do this, maybe we need to do the same. +- Currently, we get vtable addresses from the GOT. Another solution would be to store the data from the vtables in the .bss section, so accessing them would involve less indirection. +- When saving information used to identify classes/methods, we use an add-hoc encoding. An encoding similar to the metadata encoding should be used instead. + +[Original version of this document in git](https://github.com/mono/mono/blob/e6d522976e24e572f0e7bc344ae4b8f79f955c6f/docs/aot-compiler.txt) diff --git a/docs/design/mono/web/bitcode.md b/docs/design/mono/web/bitcode.md new file mode 100644 index 0000000000000..1d32c67cc7b6a --- /dev/null +++ b/docs/design/mono/web/bitcode.md @@ -0,0 +1,145 @@ +# Bitcode + +## Introduction + +Bitcode imposes the following major restrictions: + +- No inline assembly/machine code +- Compilation using stock clang + +To enable the runtime to operate in this environment, a new execution mode 'llvmonly' was implemented. In this mode: + +- everything is compiled to llvm bitcode, then compiled to native code using clang. +- no trampolines, etc. are used. + +In the rest of this document, 'normal mode' is used to refer to the JIT/full aot mode previously supported by the runtime. + +## Concepts + +### Passing extra arguments + +The runtime used trampolines to pass extra arguments to some generic shared methods. This is not possible in llvmonly mode. Instead, these arguments are passed normally as an additional argument, and the caller is responsible for passing them. The method address and the possible additional argument are encapsulated together into a function descriptor represented by a MonoFtnDesc structure. These function descriptors are used instead of method addresses anywhere where a callee might require an extra argument. A call using an ftndesc looks like this: + +``` c +ftndesc->addr (, ftndesc->arg); +``` + +The 'arg' field might be null, in which case the caller will pass one more argument than the callee requires, but that is not a problem with most calling conventions. + +### Lazy initialization + +Trampolines were used in many places in the runtime to initialize/load methods/code on demand. Instead, either the caller or the callee needs to check whenever initialization is required, and call into runtime code to do it. + +## Details + +### Method initialization + +AOT methods require the initialization of GOT slots they are using. In normal execution mode, this was accomplished by calling them through PLT entries. The PLT entry would look up the method code, initialize its GOT slots, then transfer control to it. In llvmonly mode, methods initialize themselves. Every AOT module has an 'inited' bit array with one bit for every method. The method code checks this bit in its prolog, and if its 0, calls a runtime function to initialize the method. + +In llvmonly mode, no trampolines are created for methods. Instead, the method's code is looked up immediately. This doesn't create lazy initialization problems because the method is initialized lazily, so looking up its code doesn't change managed state, i.e. it doesn't run type cctors etc. + +### Looking up methods + +In normal mode, AOT images contained a table mapping method indexes to method addresses. This table was emitted using inline assembly. In llvmonly mode, there is a generated llvm function which does this mapping using a switch statement. + +### Unbox trampolines + +In normal mode, these were emitted using inline assembly. In llvmonly mode, these are emitted as llvm code. With optimizations enabled, llvm can emit the same or very similar code. + +### Null checks + +Since the target plaform for bitcode doesn't support sigsegv signal handlers, explicit null checks are emitted. + +### Normal calls + +Calls are made through a GOT slot, or directly, if the callee is in the same assembly, and its corresponding llvm method can be looked up at compile time. + +### Virtual calls + +Vtable slots contain ftn descriptors. They are initialized to null when the vtable is created, so the calling code has to initialize them on demand. So a virtual calls looks like this: + +``` c +if (vtable [slot] == null) + init_vtable_slot (vtable, slot); +ftndesc = vtable [slot]; + +``` + +### Interface calls + +Interface calls are implemented using IMT. The imt entries in the vtable contain an ftndesc. The ftndesc points to a imt thunk. IMT thunks are C functions implemented in the runtime. They receive the imt method, and a table of `` pairs, and return the ftndesc corresponding to the imt method. + +The generated code looks like this: + +``` c +imt_ftndesc = vtable [imt_slot]; +ftndesc = imt_ftndesc->addr (imt_method, imt_ftndesc->arg); + +``` + +The imt entries are initialized to point to an 'initial imt thunk', which computes the real imt thunk when first called, and replaces the imt entry to point to the real imt thunk. This means that the generated code doesn't need to check whenever the imt entry is initialized. + +### Generic virtual calls + +These are handled similarly to interface calls. + +### Gsharedvt + +There are two kinds of gsharedvt methods: ones with a variable signature, and those without one. A variable signature is a signature which includes parameters/return values whose size is not known at compile time. Gsharedvt methods without variable signatures are handler similarly as in normal mode. Methods with variable signatures are handles as follows: all parameters and returned by ref, even the fixed size ones. I.e., for `T foo (int i, T t)`, both 'i' and 't' are passed by ref, and the result is returned by ref using a hidden argument. So the real signature of the gsharedvt version of foo looks like this: + +``` c +void foo (ref T_GSHAREDVT vret, ref int i, ref T_GSHAREDVT t, ); +``` + +Calls between normal and gsharedvt methods with a variable signature go though gsharedvt in/out wrappers. These are normal runtime wrappers generated by the runtime as IL code. The AOT compiler collects every possible concrete signature from the program, and generates in/out wrappers for them. Wrappers for similar signatures are shared to decrease the number of required wrappers. + +A gsharedvt in wrapper for the method above looks like this (T==int): + +``` c +int gsharedvt_in_int_int (int i, int t, ftndesc callee) +{ + int res; + + callee->addr (&res, &i, &t, callee->arg); + return res; +} +``` + +While a gsharedvt out wrapper for the same instantiation looks like: + +``` c +void gsharedvt_out_int_int (ref int vret, ref int i, ref int t, ftndesc callee) +{ + *vret = callee->addr (*i, *t, callee->arg); +} +``` + +The last argument to the wrappers is an ftndesc for the method which needs to be called. + +### Delegates + +In normal mode, delegate trampolines and various small invoke trampolines are used to implement delegate creation/invocation efficiently. In llvmonly mode, we fall back to the normal delegate-invoke wrappers. The delegates need to invoke an ftndesc since the target method can require an extra argument. The 'addr' part of the ftndesc is stored in `MonoDelegate.method_ptr`, and the 'arg' part is stored in `MonoDelegate.extra_arg`. The delegate invoke wrapper uses a special IL opcode called `CEE_MONO_CALLI_EXTRA_ARG` to make the call which takes this into account. + +If the target method is gsharedvt, we cannot add an gsharedvt in wrapper around it, since the concrete signature required might not exist at compile time if the delegate is only invoked through a gsharedvt delegate-invoke wrapper. To work around this, we set the lowest bit of `MonoDelegate.extra_arg` to indicate this, and the `CALLI_EXTRA_ARG` opcode generates code which checks at runtime to see which calling conv needs to be used. + +### Runtime invoke + +Runtime invoke is used to dynamically invoke managed methods. It is implemented using runtime-invoke wrappers, which receive an C array of parameter values, and pass it to a method which is called. + +For example, the runtime-invoke wrapper for the `foo` method above looks like: + +``` c +void runtime_invoke_int_int (gpointer[] params, gpointer addr, gpointer *exc) +{ + try { + int ret = addr (params [0], params [1]); + return box(ret, typeof); + } catch (Exception ex) { + *exc = ex; + } +} +``` + +There is one runtime invoke wrapper for each possible signature, with some sharing. To cut down on the number of wrappers generated, in normal mode, we use a 'dyn-call' opcode which can support a large number of signatures. + +In llvmonly mode, we use the gsharedvt out wrappers which are already generated to support gsharedvt to implement runtime invokes. This is useful because the possible set of signatures for gsharedvt out wrappers is limited since all their arguments are pointers. Instead of invoking the method directly from the runtime-invoke wrapper, we invoke the gsharedvt out wrapper. So the call looks like this: runtime-invoke wrapper -> gsharedvt out wrapper -> target method. diff --git a/docs/design/mono/web/coop-suspend.md b/docs/design/mono/web/coop-suspend.md new file mode 100644 index 0000000000000..78bffd3fca074 --- /dev/null +++ b/docs/design/mono/web/coop-suspend.md @@ -0,0 +1,243 @@ +# Runtime Cooperative Suspend + +## Intro: Preemptive, Cooperative and Hybrid Suspend + +The runtime needs to be able to suspend threads to perform all sorts of tasks, the main one being garbage collection. +Those threads need to be suspended from another and historically Mono used signals (or similar APIs) to do it. + +The basic problem is that when the runtime needs to stop threads (for example at some steps during GC) there are two general approaches: +* Preemptive - the runtime sends a signal to the thread and the signal handler for the thread puts it to sleep until it gets a resume signal. (or on Windows or Apple OSes, it uses a kernel calls to stop the thread). + The problem of using signals is that threads are suspended at arbitrary points in time, which requires the suspender +thread to run in the equivalent of signal context - a very very restrictive setup. Not only that, but the fact that +threads could be suspended while holding runtime and libc locks meant that not even basic things like printf were available. + Also on some platforms (watchOS, WebAssembly) we don't have enough OS facilities to examine the context when a thread is suspended - we can't see the contents of their registers, or their stack, and thus preemptive suspend on those systems wouldn't be useful for GC and other runtime operations that need to examine the state of suspended threads. +* Cooperative - The alternative is to use cooperative suspend, where threads suspend themselves when the runtime requests it. To make +this possible, frequent polling and checkpointing are required. This is a well understood model that goes along what +the industry does. + With this, as long as the thread is running managed code, it will eventually reach a safepoint and suspend itself. The advantage is that it will always be in a "nice" place. + There is more to keep track of in cooperative mode when a thread calls native code - while it's in native it won't have safepoints and it might block for arbitrary amounts of time. So the runtime marks all the places where a thread goes from managed code ("GC Unsafe" - because it can manipulate managed memory) to native code ("GC Safe" - because it's not supposed to access managed memory). When the thread is in GC Safe mode instead of trying to suspend it, we just let it run until it tries to come back to GC Unsafe mode. + The problem with cooperative suspend is that it relies on nice (cooperating) behavior from embedders and from native code - if the native code calls back into Mono suddenly it might be running managed code again when the GC thinks that it is not. And that can cause problems. So to use preemptive mode, the native code has to be explicitly annotated with GC transitions - telling the runtime when the thread is switching between GC Safe and GC Unsafe modes. +* Hybrid suspend - a combination of the previous two approaches. While the thread is in managed code or in the Mono runtime itself, it is in GC Unsafe mode. In GC Unsafe mode we will try to suspend it cooperatively by expecting the thread to reach a safepoint and suspend itself. But when the thread calls out to native code we switch it to GC Safe mode and start preemptively suspending it. That way no matter what kind of native code it is running, we will stop it and it won't be able to invalidate our assumptions by touching managed memory or calling runtime functions. + Hybrid suspend requires even more bookkeeping (every embedding API function needs to switch from GC Safe mode to GC Unsafe on entry and back on exit), but all the bookkeeping is done by the runtime, not by the user code. + So hybrid suspend is a good approach because the embedder code doesn't need to be aware of it - it behaves just like preemptive. But at the same time it is less likely to suspend the thread in a state that is inconvenient for the runtime, unlike preemptive suspend. + +## How cooperative and hybrid suspend works + +Cooperative suspend limits what a suspender thread can do to simply request that the target thread suspends itself. +The target thread can serve a suspend in two manners, by frequently polling its state or checkpointing its state +at points the runtime loses control of the thread (pinvoke, blocking syscall). + +We can split code in 3 categories: managed, runtime native code and foreign native code. This tells how coop suspend happens. + +### Managed code + +Managed code will check for suspend requests on function prologue, catch handlers and the back-edge of loops. This ensures that +a suspend will be served in a bounded amount of time. Those suspend checks are done at what's referred as safepoints. + +This is implemented in mini.c:mono_insert_safepoints. It will add OP_GC_SAFE_POINT ops around the method. +Then each backend will emit machine code for those new ops. [1] + +### Foreign native code + +This includes pinvokes and arbitrary native code when the runtime is embedded. Foreig code doesn't touch managed objects +so it's safe for the GC to ignore both the stack and the code being executed by those. + +Before executing a pinvoke, we save the current thread registers and transition it to the equivalent of the suspended state. +It means the GC can take the saved state as is and ignore that the thread keeps running. + +### Runtime native code + +This encompasses all runtime code, metatada, utils and mini. Special care must be taken to icalls. +Runtime code is different as it operates on raw object pointers, meaning that the GC must be aware of them. +To do so we handle runtime code just as managed code, except we don't get safepoints automatically inserted for us. + +Manual insertion of polling code and checkpointing must be done in the runtime. In addition to that, we must be careful +of how we access managed memory once we save the thread state. + +## Current Implementation + +The current implementation is a state machine that tells what's the current status of a thread. These are the +states: + +* Starting: Initial state of a thread, nothing interesting should happen while in this state. +* Detached: Thread is shuting down, it won't touch managed memory or do any runtime work. +* Running: The thread is running managed or runtime code. There are no pending suspend requests. +* AsyncSuspendRequested: The thread is running managed or runtime code and another thread requested that the current thread be suspended. +* SelfSuspended: Thread suspended by itself. This happens if a thread tried to switch to blocking, but there was a pending suspend requested and the thread suspended itself instead. It will go back to running and the switch to blocking will be retried. +* AsyncSuspended: Thread was async suspended, so it's on a signal handler or thread_suspend was called on it. (This state never happens when running threads are cooperatively suspended) +* Blocking: The current thread is executing code that won't touch managed memory. There are no pending suspend requests. +* BlockingSuspendRequested: The current thread is executing code that won't touch managed memory, and someone requested it to suspend. In full cooperative mode, the thread is assumed to still be suspended. +* BlockingSelfSuspended: The current thread finished executing blocking code but there was a pending suspend against it, it's waiting to be resumed. +* BlockingAsyncSuspended: The current thread was executing in blocking code, but it was preemptively suspended. This is done in "hybrid" suspend mode. When the thread resumes, it will go back to executing blocking code. + +![Coop state machine transition diagram](images/coop-state-machine.png) + +In addition to those states, there are a number of transitions, that are used to move a thread from one state to another. + +## mono-threads.c, mono-threads-coop.c, mono-threads-state-machine.c + +Thread suspension is modeled with a state machine, which means there are a bunch of transitions. Those +are implemented in mono-threads-state-machine.c. One function per transition. All manipulation of the thread_state variable happens +here. New functions must follow the same template of the existing ones and must include every state either on the switch or on the comments. + +mono-threads.c is the portable implementation of the threading infrastructure. Which there are multiple backends that implement target +specific functionality. The amount of ifdefs here should be kept at a minimum. + +mono-threads-coop.c is the cooperative backend. It doesn't use any async APIs provided by the OS. + +## Adding coop to the runtime + +The runtime code must satisfy two properties to work with cooperative suspend, It must suspend in bounded time, by polling and +check pointing before blocking, and it must coordinate with the GC when accessing the managed heap. + +We combine those two properties together are they are completementary. Every region of code in the runtime is then classified +in one of 3 kinds, which tells what can and can't be done. + +### GC unsafe mode + +Under this mode, the GC won't be able to proceed until explicit polling or a transition to GC Safe mode happens. + +* Can touch managed memory (read/write). +* Can call GC Unsafe or GC Neutral functions. +* Can pass managed pointers to GC Safe regions/functions through pinning +* Can return managed pointers +* Cannot call foreign native code (embedder callbacks, pinvokes, etc) +* Cannot call into blocking functions/syscalls +* Cannot be detached + +## GC safe mode + +Under this mode, the GC will assume the thread is suspended and will scan the last saved state. + +* Can call into foreign functions. +* Can call into blocking functions/syscalls +* Can call GC Safe or GC Neutral functions +* Can read from pinned managed memory +* Cannot touch managed memory (read/write) +* Cannot be detached + +## GC Neutral mode + +This mode only signals that the function works under Safe and Unsafe modes. The actual effect on the GC will depend +on the dynamic mode the thread is when the function is executed. + +* Can call GC Neutral functions +* Cannot call into foreign functions. +* Cannot call into blocking functions/syscalls +* Cannot read from pinned managed memory +* Cannot touch managed memory (read/write) +* Cannot be detached + +There's a special group of functions that are allowed to run detached. All they are allowed to do is +attach, pick a GC mode and call into regular GC functions. + +All functions can transition from one mode to the other and then back. The runtime provides macros that +make a region of a function run in a different mode. Those macros are defined in mono-threads-coop.h. + +Those macros define a possible transitions between GC safe/unsafe. They are: + +### MONO_SUSPEND_CHECK + +This polls the current GC state and possibly suspend the thread. +Ok only under GC unsafe mode. + +Use it when a huge computation is happening with no explicit blocking happening. + +### MONO_PREPARE_BLOCKING / MONO_FINISH_BLOCKING + +Creates a C lexical scope. It causes a transition from Unsafe to Safe mode. +Ok only under Unsafe mode. + +Great around a syscall that can block for a while (sockets, io). +Managed pointers *cannot* leak into the GC Safe region, as the GC might run while the thread is in this section, and move the referenced object around in the managed heap, leading to an invalid naked object pointer. For example, the following code is broken: + +```c +MonoArray *x; +int res; +MONO_PREPARE_BLOCKING +res = read (1, mono_array_addr (x, char, 0), mono_array_length (x), 0); // if a GC run while read is blocked in the OS, the object x might be moved, and x would then point to garbage, or worst, in the middle of another object. And whenever the OS would write into the buffer passed to read, it would override managed memory. +MONO_FINISH_BLOCKING +``` + +To safely use an object reference in a GC safe section, the object needs to be pinned in the managed heap with a GC handle, and you cannot access any ref field on this object. + +### MONO_PREPARE_RESET_BLOCKING / MONO_FINISH_RESET_BLOCKING + +Creates a C lexical scope. It causes a transition to Unsafe mode. Resets to the previous mode on exit. +Ok under any mode. + +This covers the case where code was expected to be in GC Safe mode but it now needs to be under GC Unsafe. + +For example, the first call to a pinvoke will hit a trampoline that needs to move the runtime back into GC Unsafe +mode before going around resolving it. Once the pinvoke is resolved, the previous mode must be restored. + +## Managed object handles + +Mono coop handles (`MonoObjectHandle`) allow native code to hold a +handle to a managed object. While currently raw pointers to managed +objects in native code work without problems, they do so only because +we use a conservative technique when the garbage collector is scanning +the native stack: every object that looks like it may be referenced +from the native stack is pinned. + +In the future, we want to move away from conservative scanning, and +coop handles give native code a way to coordinate with the GC. + +TODO: Document this more + +### MONO_PREPARE_GC_CRITICAL_REGION / MONO_FINISH_GC_CRITICAL_REGION + +When a thread is in Unsafe mode and uses the coop handles, it may need +to enter a *GC critical region* where it is manipulating the managed +objects in a non-atomic manner and must not be interrupted by the GC. + +In a GC critical region: + +* The thread *must not* transition from Unsafe to Safe mode. +* The thread *may* use `gc_handle_obj` to get a raw pointer to a managed object from a coop handle. + +GC critical regions may be nested (for example, you may enter a GC +critical region and then call a function that again enters a GC +critical region). + +#### MONO_REQ_GC_CRITICAL and MONO_REC_GC_NOT_CRITICAL + +In checked Mono builds, this pair of macros can be used to assert that +the thread is (respectively, isn't) in a GC critical region. + +## Debugging + +There are two debug helpers in place. The first is the thread state dump when we fail to suspend in time. +It dumps the thread state of each thread plus a cue card on the beginning to help us parse it. + +The second one are the toggles in mono-threads.h for specific logging of threading events. Those are VERY verbose +but do help figure out what's going on. + +## Known issues + +### Can't handle the embedding API + +The current system doesn't take into account the runtime being used embedded. This boils down to a couple of issues. +First, if a native thread calls into managed then keep doing its thing. We might not be leaving the thread in the +appropriate state. + +Second, the embedding API allows for raw object access, which is incompatible with coop. We need to figure out how to expose +coop to embedders. + +### Thread start/finish still bad + +There are a lot of hacks around how we handle threads starting and finishing. If a suspend hits a thread while it's +starting/finishing we fail every now and then. + +### Non nested blocking state + +An early decision that I made was to disallow nested blocking states. It was forbidden because it's more complicated and +could hide bugs in between the nesting. The downside is that it's hard to cover large blocks of code under a single blocking region. + +### Thread attach/detach + +This aspect of the runtime is due to some revision. I don't think it goes well with what we need now. + +## References + +[1] diff --git a/docs/design/mono/web/exception-handling.md b/docs/design/mono/web/exception-handling.md new file mode 100644 index 0000000000000..2561c9245e896 --- /dev/null +++ b/docs/design/mono/web/exception-handling.md @@ -0,0 +1,188 @@ +# Exception Handling + +Exception Handling In the Mono Runtime +-------------------------------------- + +### Introduction + +There are many types of exceptions which the runtime needs to handle. These are: + +- exceptions thrown from managed code using the 'throw' or 'rethrow' CIL instructions. + +- exceptions thrown by some IL instructions like InvalidCastException thrown by the 'castclass' CIL instruction. + +- exceptions thrown by runtime code + +- synchronous signals received while in managed code + +- synchronous signals received while in native code + +- asynchronous signals + +Since exception handling is very arch dependent, parts of the exception handling code reside in the arch specific exceptions-\.c files. The architecture independent parts are in mini-exceptions.c. The different exception types listed above are generated in different parts of the runtime, but ultimately, they all end up in the mono_handle_exception () function in mini-exceptions.c. + +### Exceptions throw programmatically from managed code + +These exceptions are thrown from managed code using 'throw' or 'rethrow' CIL instructions. The JIT compiler will translate them to a call to a helper function called 'mono_arch_throw/rethrow_exception'. + +These helper functions do not exist at compile time, they are created dynamically at run time by the code in the exceptions-\.c files. + +They perform various stack manipulation magic, then call a helper function usually named throw_exception (), which does further processing in C code, then calls mono_handle_exception() to do the rest. + +### Exceptions thrown implicitly from managed code + +These exceptions are thrown by some IL instructions when something goes wrong. When the JIT needs to throw such an exception, it emits a forward conditional branch and remembers its position, along with the exception which needs to be emitted. This is usually done in macros named EMIT_COND_SYSTEM_EXCEPTION in the mini-\.c files. + +After the machine code for the method is emitted, the JIT calls the arch dependent mono_arch_emit_exceptions () function which will add the exception throwing code to the end of the method, and patches up the previous forward branches so they will point to this code. + +This has the advantage that the rarely-executed exception throwing code is kept separate from the method body, leading to better icache performance. + +The exception throwing code braches to the dynamically generated mono_arch_throw_corlib_exception helper function, which will create the proper exception object, does some stack manipulation, then calls throw_exception (). + +### Exceptions thrown by runtime code + +These exceptions are usually thrown by the implementations of InternalCalls (icalls). First an appropriate exception object is created with the help of various helper functions in metadata/exception.c, which has a separate helper function for allocating each kind of exception object used by the runtime code. Then the mono_raise_exception () function is called to actually throw the exception. That function never returns. + +An example: + + if (something_is_wrong) + mono_raise_exception (mono_get_exception_index_out_of_range ()); + +mono_raise_exception () simply passes the exception to the JIT side through an API, where it will be received by the helper created by mono_arch_throw_exception (). From now on, it is treated as an exception thrown from managed code. + +### Synchronous signals + +For performance reasons, the runtime does not do same checks required by the CLI spec. Instead, it relies on the CPU to do them. The two main checks which are omitted are null-pointer checks, and arithmetic checks. When a null pointer is dereferenced by JITted code, the CPU will notify the kernel through an interrupt, and the kernel will send a SIGSEGV signal to the process. The runtime installs a signal handler for SIGSEGV, which is sigsegv_signal_handler () in mini.c. The signal handler creates the appropriate exception object and calls mono_handle_exception () with it. Arithmetic exceptions like division by zero are handled similarly. + +### Synchronous signals in native code + +Receiving a signal such as SIGSEGV while in native code means something very bad has happened. Because of this, the runtime will abort after trying to print a managed plus a native stack trace. The logic is in the mono_handle_native_sigsegv () function. + +Note that there are two kinds of native code which can be the source of the signal: + +- code inside the runtime +- code inside a native library loaded by an application, ie. libgtk+ + +### Stack overflow checking + +Stack overflow exceptions need special handling. When a thread overflows its stack, the kernel sends it a normal SIGSEGV signal, but the signal handler tries to execute on the same stack as the thread leading to a further SIGSEGV which will terminate the thread. A solution is to use an alternative signal stack supported by UNIX operating systems through the sigaltstack (2) system call. When a thread starts up, the runtime will install an altstack using the mono_setup_altstack () function in mini-exceptions.c. When a SIGSEGV is received, the signal handler checks whenever the fault address is near the bottom of the threads normal stack. If it is, a StackOverflowException is created instead of a NullPointerException. This exception is handled like any other exception, with some minor differences. + +There are two reasons why sigaltstack is disabled by default: + +- The main problem with sigaltstack() is that the stack employed by it is not visible to the GC and it is possible that the GC will miss it. + +- Working sigaltstack support is very much os/kernel/libc dependent, so it is disabled by default. + +### Asynchronous signals + +Async signals are used by the runtime to notify a thread that it needs to change its state somehow. Currently, it is used for implementing thread abort/suspend/resume. + +Handling async signals correctly is a very hard problem, since the receiving thread can be in basically any state upon receipt of the signal. It can execute managed code, native code, it can hold various managed/native locks, or it can be in a process of acquiring them, it can be starting up, shutting down etc. Most of the C APIs used by the runtime are not asynch-signal safe, meaning it is not safe to call them from an async signal handler. In particular, the pthread locking functions are not async-safe, so if a signal handler interrupted code which was in the process of acquiring a lock, and the signal handler tries to acquire a lock, the thread will deadlock. + +When receiving an async signal, the signal handler first tries to determine whenever the thread was executing managed code when it was interrupted. If it did, then it is safe to interrupt it, so a ThreadAbortException is constructed and thrown. If the thread was executing native code, then it is generally not safe to interrupt it. In this case, the runtime sets a flag then returns from the signal handler. That flag is checked every time the runtime returns from native code to managed code, and the exception is thrown then. Also, a platform specific mechanism is used to cause the thread to interrupt any blocking operation it might be doing. + +The async signal handler is in sigusr1_signal_handler () in mini.c, while the logic which determines whenever an exception is safe to be thrown is in mono_thread_request_interruption (). + +### Stack unwinding during exception handling + +The execution state of a thread during exception handling is stored in an arch-specific structure called MonoContext. This structure contains the values of all the CPU registers relevant during exception handling, which usually means: + +- IP (instruction pointer) +- SP (stack pointer) +- FP (frame pointer) +- callee saved registers + +Callee saved registers are the registers which are required by any procedure to be saved/restored before/after using them. They are usually defined by each platforms ABI (Application Binary Interface). For example, on x86, they are EBX, ESI and EDI. + +The code which calls mono_handle_exception () is required to construct the initial MonoContext. How this is done depends on the caller. For exceptions thrown from managed code, the mono_arch_throw_exception helper function saves the values of the required registers and passes them to throw_exception (), which will save them in the MonoContext structure. For exceptions thrown from signal handlers, the MonoContext stucture is initialized from the signal info received from the kernel. + +During exception handling, the runtime needs to 'unwind' the stack, i.e. given the state of the thread at a stack frame, construct the state at its callers. Since this is platform specific, it is done by a platform specific function called mono_arch_find_jit_info (). + +Two kinds of stack frames need handling: + +- Managed frames are easier. The JIT will store some information about each managed method, like which callee-saved registers it uses. Based on this information, mono_arch_find_jit_info () can find the values of the registers on the thread stack, and restore them. On some platforms, the runtime now uses a generic unwinder based on the [DWARF unwinding interface](http://dwarfstd.org/Dwarf3.pdf). The generic unwinder is in the files unwind.h/unwind.c. + +- Native frames are problematic, since we have no information about how to unwind through them. Some compilers generate unwind information for code, some don't. Also, there is no general purpose library to obtain and decode this unwind information. So the runtime uses a different solution. When managed code needs to call into native code, it does through a managed-\>native wrapper function, which is generated by the JIT. This function is responsible for saving the machine state into a per-thread structure called MonoLMF (Last Managed Frame). These LMF structures are stored on the threads stack, and are linked together using one of their fields. When the unwinder encounters a native frame, it simply pops one entry of the LMF 'stack', and uses it to restore the frame state to the moment before control passed to native code. In effect, all successive native frames are skipped together. + +### Problems/future work + +#### Raising exceptions from native code + +Currently, exceptions are raised by calling mono_raise_exception () in the middle of runtime code. This has two problems: + +- No cleanup is done, ie. if the caller of the function which throws an exception has taken locks, or allocated memory, that is not cleaned up. For this reason, it is only safe to call mono_raise_exception () 'very close' to managed code, ie. in the icall functions themselves. + +- To allow mono_raise_exception () to unwind through native code, we need to save the LMF structures which can add a lot of overhead even in the common case when no exception is thrown. So this is not zero-cost exception handling. + +An alternative might be to use a JNI style set-pending-exception API. Runtime code could call mono_set_pending_exception (), then return to its caller with an error indication allowing the caller to clean up. When execution returns to managed code, then managed-\>native wrapper could check whenever there is a pending exception and throw it if neccesary. Since we already check for pending thread interruption, this would have no overhead, allowing us to drop the LMF saving/restoring code, or significant parts of it. + +### libunwind + +There is an OSS project called libunwind which is a standalone stack unwinding library. It is currently in development, but it is used by default by gcc on ia64 for its stack unwinding. The mono runtime also uses it on ia64. It has several advantages in relation to our current unwinding code: + +- it has a platform independent API, i.e. the same unwinding code can be used on multiple platforms. + +- it can generate unwind tables which are correct at every instruction, i.e. can be used for unwinding from async signals. + +- given sufficient unwind info generated by a C compiler, it can unwind through C code. + +- most of its API is async-safe + +- it implements the gcc C++ exception handling API, so in theory it can be used to implement mixed-language exception handling (i.e. C++ exception caught in mono, mono exception caught in C++). + +- it is MIT licensed + +The biggest problem with libuwind is its platform support. ia64 support is complete/well tested, while support for other platforms is missing/incomplete. + +[http://www.hpl.hp.com/research/linux/libunwind/](http://www.hpl.hp.com/research/linux/libunwind/) + +### Architecture specific functions for EH + +This section contains documentation for the architecture specific functions which are needed to be implemented by each backend. These functions usually reside in the exceptions-\.c file. + +#### mono_arch_handle_exception () + +Prototype: + +``` bash +gboolean +mono_arch_handle_exception (void *ctx, gpointer obj); +``` + +This function is called by signal handlers. It receives the machine state as passed to the signal handlers in he CTX argument. On unix, this is an uncontext_t structure, It also receives the exception object in OBJ, which might be null. Handling exceptions in signal handlers is problematic for many reasons, so this function should set up CTX so when the signal handler returns, execution continues in another runtime function which does the real work. CTX/OBJ needs to be passed to that function. The former can be passed in TLS, while the later has to be passed in registers/on the stack (by modifying CTX), since TLS storage might not be GC tracked. + +[Original version of this document in git.](https://github.com/mono/mono/blob/2279f440996923ac66a6ea85cf101d89615aad69/docs/exception-handling.txt) + +#### mono_arch_get_restore_context () + +Prototype: + +``` bash +gpointer +mono_arch_get_restore_context (MonoTrampInfo **info, gboolean aot); +``` + +This function should return a trampoline with the following signature: + +``` bash +void restore_context (MonoContext *ctx); +``` + +The trampoline should set the machine state to the state in CTX, then jump to the PC in CTX. Only a subset of the state needs to be restored, i.e. the callee saved registers/sp/fp. + +#### mono_arch_get_call_filter () + +Prototype: + +``` bash +gpointer +mono_arch_get_call_filter (MonoTrampInfo **info, gboolean aot) +``` + +This function should return a trampoline with the following signature: + +``` bash +int call_filter (MonoContext *ctx, gpointer addr); +``` + +This trampoline is used to call finally and filter clauses during exception handling. It should setup a new stack frame, save callee saved registers there, restore the same registers from CTX, then make a call to ADDR, restore the saved registers, and return the result returned by the call as its result. Finally clauses need access to the method state, but they need to make calls etc too, so they execute in a nonstandard stack frame, where FP points to the original FP of the method frame, while SP is normal, i.e. it is below the frame created by call_filter (). This means that call_filter () needs to load FP from CTX, but it shouldn't load SP. diff --git a/docs/design/mono/web/generic-sharing.md b/docs/design/mono/web/generic-sharing.md new file mode 100644 index 0000000000000..a671ac5c39ad2 --- /dev/null +++ b/docs/design/mono/web/generic-sharing.md @@ -0,0 +1,139 @@ +# Generic Sharing + +Source code +---------- + +The code which implements generic sharing is in `mini-generic-sharing.c`. The architecture specific parts are in `mini-.c` and `tramp-.c`. + +RGCTX register +-------------- + +Generic shared code needs access to type information. This information is contained in a RGCTX for non-generic methods and in an MRGCTX for generic methods. It is passed in one of several ways, depending on the type of the called method: + +1. Non-generic non-static methods of reference types have access to the RGCTX via the "this" argument (this-\>vtable-\>rgctx). + +2. Non-generic static methods of reference types and non-generic methods of value types need to be passed a pointer to the caller's class's VTable in the MONO_ARCH_RGCTX_REG register. + +3. Generic methods need to be passed a pointer to the MRGCTX in the `MONO_ARCH_RGCTX_REG` register. + +The `MONO_ARCH_RGCTX_REG` must not be clobbered by trampolines. + +`MONO_ARCH_RGCTX_REG` is the same as the IMT register on all platforms. The reason for this is that the RGCTX register is used to pass information to a concrete method, while the IMT register is used for indirect calls where +the called method is not known, so the the same call doesn't use both an RGCTX and an IMT register. + +This register lifetime starts at the call site that loads it and ends in the callee prologue when it is either discarded or stored into a local variable. + +It's better to avoid registers used for argument passing for the RGCTX as it would make the code dealing with calling conventions code a lot harder. + +For indirect calls, the caller doesn't know the RGCTX value which needs to be passed to the callee. In this case, an 'rgctx trampoline' is used. These are small trampolines created by `mono_create_static_rgctx_trampoline()`. The caller calls the trampoline, which sets the RGCTX to the required value and jumps to the callee. These trampolines are inserted into the call chain when indirect calls are used (virtual calls, delegates, runtime invoke etc.). + +An alternative design would pass the rgctx as a normal parameter, which would avoid the need for an RGCTX register. The problem with this approach is that the caller might not know whenever the callee needs an RGCTX argument +or not. I.e. the callee might be a non-shared method, or even a non-generic method (i.e. `Action` can end up calling a `foo(int)` or a `foo (T)` instantiated with `int`.). + +Method prologue +--------------- + +Generic shared code that have a `RGCTX` receive it in `RGCTX_REG`. There must be a check in mono_arch_emit_prolog for MonoCompile::rgctx_var and if set store it. See mini-x86.c for reference. + +Dealing with types +------------------ + +During JITting and at runtime, the generic parameters used in shared methods are represented by a `MonoGenericParam` with the `gshared_constraint` field pointing to a `MonoType` which identifies the set of types this +generic param is constrained to. If the constraint is `object`, it means the parameter can match all reference types. If its `int`, it can match `int` and all enums whose basetype is `int` etc. + +Calling `mini_get_underlying_type()` on the type will return the constraint type. This is used through the JIT to handle generic parameters without needing to special case them, since for example, a generic parameter constrained to be a reference type can be handled the same way as `MONO_TYPE_OBJECT`. + +(M)RGCTX lazy fetch trampoline +------------------------------ + +The purpose of the lazy fetch trampoline is to fetch a slot from an (M)RGCTX which might not be inited, yet. In the latter case, it needs to go make a transition to unmanaged code to fill the slot. This is the layout of a RGCTX: + + +---------------------------------+ + | next | slot 0 | slot 1 | slot 2 | + +--|------------------------------+ + | + +-----+ + | +--------------------------------- + +->| next | slot 3 | slot 4 | slot 5 .... + +--|------------------------------ + | + +-----+ + | +------------------------------------ + +->| next | slot 10 | slot 11 | slot 12 .... + +--|--------------------------------- + . + . + . + +For fetching a slot from a RGCTX the trampoline is passed a pointer (as a normal integer argument) to the VTable. From there it has to fetch the pointer to the RGCTX, which might be null. Then it has to traverse the correct number of "next" links, each of which might be NULL. Arriving at the right array it needs to fetch the slot, which might also be NULL. If any of the NULL cases, the trampoline must transition to unmanaged code to potentially setup the RGCTX and fill the slot. Here is pseudo-code for fetching slot 11: + +  ; vtable ptr in r1 +  ; fetch RGCTX array 0 + r2 = *(r1 + offsetof(MonoVTable, runtime_generic_context)) + if r2 == NULL goto unmanaged +  ; fetch RGCTX array 1 + r2 = *r2 + if r2 == NULL goto unmanaged +  ; fetch RGCTX array 2 + r2 = *r2 + if r2 == NULL goto unmanaged +  ; fetch slot 11 + r2 = *(r2 + 2 * sizeof (gpointer)) + if r2 == NULL goto unmanaged + return r2 + unmanaged: + jump unmanaged_fetch_code + +The number of slots in the arrays must be obtained from the function `mono_class_rgctx_get_array_size()`. + +The MRGCTX case is different in two aspects. First, the trampoline is not passed a pointer to a VTable, but a pointer directly to the MRGCTX, which is guaranteed not to be NULL (any of the next pointers and any of the slots can be NULL, though). Second, the layout of the first array is slightly different, in that the first two slots are occupied by a pointers to the class's VTable and to the method's method_inst. The next pointer is in the third slot and the first actual slot, "slot 0", in the fourth: + + +--------------------------------------------------------+ + | vtable | method_inst | next | slot 0 | slot 1 | slot 2 | + +-------------------------|------------------------------+ + . + . + +All other arrays have the same layout as the RGCTX ones, except possibly for their length. + +The function to create the trampoline, mono_arch_create_rgctx_lazy_fetch_trampoline(), gets passed an encoded slot number. Use the macro `MONO_RGCTX_SLOT_IS_MRGCTX` to query whether a trampoline for an MRGCTX is needed, as opposed to one for a RGCTX. Use `MONO_RGCTX_SLOT_INDEX` to get the index of the slot (like 2 for "slot 2" as above). The unmanaged fetch code is yet another trampoline created via `mono_arch_create_specific_trampoline()`, of type `MONO_TRAMPOLINE_RGCTX_LAZY_FETCH`. It's given the slot number as the trampoline argument. In addition, the pointer to the VTable/MRGCTX is passed in `MONO_ARCH_VTABLE_REG` (like the VTable to the generic class init trampoline - see above). + +The RGCTX fetch trampoline code doesn't return code that must be jumped to, so, like for those trampolines (see above), the generic trampoline code must do a normal return instead. + +Getting generics information about a stack frame +------------------------------------------------ + +If a method is compiled with generic sharing, its `MonoJitInfo` has the `has_generic_jit_info` bit set. In that case, the `mono_jit_info_get_generic_jit_info()` function will return +a `MonoGenericJitInfo` structure. + +The `MonoGenericJitInfo` contains information about the location of the this/vtable/MRGCTX variable, if the `has_this` flag is set. If that is the case, there are two possibilities: + +1. `this_in_reg` is set. `this_reg` is the number of the register where the variable is stored. + +2. `this_in_reg` is not set. The variable is stored at offset `this_offset` from the address in the register with number `this_reg`. + +The variable can either point to the "this" object, to a vtable or to an MRGCTX: + +1. If the method is a non-generic non-static method of a reference type, the variable points to the "this" object. + +2. If the method is a non-generic static method or a non-generic method of a value type, the variable points to the vtable of the class. + +3. If the method is a generic method, the variable points to the MRGCTX of the method. + +Layout of the MRGCTX +-------------------- + +The MRGCTX is a structure that starts with `MonoMethodRuntimeGenericContext`, which contains a pointer to the vtable of the class and a pointer to the `MonoGenericInst` with the type arguments for the method. + +Blog posts about generic code sharing +------------------------------------- + +- [September 2007: Generics Sharing in Mono](http://schani.wordpress.com/2007/09/22/generics-sharing-in-mono/) +- [October 2007: The Trouble with Shared Generics](http://schani.wordpress.com/2007/10/12/the-trouble-with-shared-generics/) +- [October 2007: A Quick Generics Sharing Update](http://schani.wordpress.com/2007/10/15/a-quick-generics-sharing-update/) +- [January 2008: Other Types](http://schani.wordpress.com/2008/01/29/other-types/) +- [February 2008: Generic Types Are Lazy](http://schani.wordpress.com/2008/02/25/generic-types-are-lazy/) +- [March 2008: Sharing Static Methods](http://schani.wordpress.com/2008/03/10/sharing-static-methods/) +- [April 2008: Sharing Everything And Saving Memory](http://schani.wordpress.com/2008/04/22/sharing-everything-and-saving-memory/) +- [June 2008: Sharing Generic Methods](http://schani.wordpress.com/2008/06/02/sharing-generic-methods/) +- [June 2008: Another Generic Sharing Update](http://schani.wordpress.com/2008/06/27/another-generic-sharing-update/) diff --git a/docs/design/mono/web/generics.md b/docs/design/mono/web/generics.md new file mode 100644 index 0000000000000..f2032da1f448c --- /dev/null +++ b/docs/design/mono/web/generics.md @@ -0,0 +1,58 @@ +# Generics + +Terminology +----------- + +Type/Method instantiation == Type/Method instance == Inflated Type/Method. + +Generic Type Definitions +------------------------ + +These are represented by a normal `MonoClass` structure with the `generic_container` field set. This field points to a `MonoGenericContainer` structure, which stores information about the generic parameters of the generic type. + +Generic Type Instantiations +--------------------------- + +These are represented by a pair of `MonoGenericClass` and `MonoClass` structures. The `generic_class` field in MonoClass is used to link the two together. The reason for the split is to avoid allocating a large MonoClass if not needed. + +It would have been better to name `MonoGenericClass` `MonoInflatedClass` or something similar. + +Generic Method Definitions +-------------------------- + +These are represented by a `MonoMethod` structure with the `is_generic` field set to 1. + +Generic Method Instantiations +----------------------------- + +These are represented by a `MonoMethodInflated` structure, which is an extension of the `MonoMethod` structure. Its `is_inflated` field is set to 1. + +One consequence of this design is that a method cannot be a pinvoke method/wrapper/dynamic method and an inflated method at the same time. + +MonoGenericContext +------------------ + +This structure holds information of an instantiation of a set of generic parameters with generic arguments. It is used by both type and method instatiations. + +Canonical generic instances +--------------------------- + +The runtime canonizes generic type/method instances, so for every set of generic arguments, there is only one type/method instance with those arguments. This is using caches in `metadata.c`. + +Lifetime of inflated types/methods +---------------------------------- + +Inflated types and methods depend on the assembly of the generic type/method definition they are inflated from, along with the assemblies of their generic arguments. This is handled using the concept of 'image sets' in metadata.c. Every inflated type/method belongs to an image set, which is a set of MonoImages. When one of the assemblies in an image set is unloaded, all the inflated types/methods belonging to the image set are freed. Memory for inflated types/methods cannot be allocated from mempools, it is allocated from the heap. The `mono_class_alloc/alloc0` functions can be used to allocate memory from the appropriate place. + +System.Reflection.Emit +---------------------- + +Generics support in System.Reflection.Emit (SRE) is very problematic because it is possible to create generic instances of not yet created dynamic types, i.e. if T is a generic TypeBuilder, it is possible to create T\. The latter is not a TypeBuilder any more, but a normal Type, which presents several problems: + +- this type needs to be kept in sync with the original TypeBuilder, i.e. if methods/fields are added to the TypeBuilder, this should be reflected in the instantiation. +- this type cannot be used normally until its TypeBuilder is finished, ie. its not possible to create instances of it etc. + +These problems are currently handled by a hierarchy of C# classes which inherit from the normal reflection classes: + +- `MonoGenericClass` represents an instantiation of a generic TypeBuilder. MS.NET calls this `TypeBuilderInstantiation`, a much better name. +- `Method/Field/Event/PropertyOnTypeBuilderInst` represents a method/field etc. of a `MonoGenericClass`. diff --git a/docs/design/mono/web/glossary.md b/docs/design/mono/web/glossary.md new file mode 100644 index 0000000000000..46b95a9190514 --- /dev/null +++ b/docs/design/mono/web/glossary.md @@ -0,0 +1,15 @@ +# Glossary + +This is a glossary of terms/abbreviations used in the runtime source code +------------------------------------------------------------------------- + +- AOT - Ahead of Time Compiler +- EH - Exception Handling +- GC - Garbage Collector +- JIT - Just In Time Compiler +- Boehm - The Boehm Conservative Garbage Collector +- trampoline - A function implemented using hand written assembly code. It is usually called from JITted code. +- SGEN - Mono's own generational garbage collector. +- SRE - System.Reflection.Emit +- vt - Valuetype +- vtype - Valuetype diff --git a/docs/design/mono/web/gsharedvt.md b/docs/design/mono/web/gsharedvt.md new file mode 100644 index 0000000000000..f91d4dfd7de3b --- /dev/null +++ b/docs/design/mono/web/gsharedvt.md @@ -0,0 +1,195 @@ +# Generic sharing for valuetypes + +## The problem + +In some environments like ios, its not possible to generate native code at runtime. This means that we have to compile all possible methods used by the application at compilation time. For generic methods, this is not always possible, i.e.: + +``` c +interface IFace { + void foo (T t); +} + +class Class1 : IFace { + public virtual void foo (T t) { + ... + } +} + +IFace o = new Class1 (); +o.foo (); +``` + +In this particular case, it is very hard to determine at compile time that `Class1:foo` will be needed at runtime. For generic methods instantiated with reference types, the mono runtime supports 'generic sharing'. + +This means that we only compile one version of the method, and use it for all instantiations made with reference types, i.e. `Array.Sort` and `Array.Sort` is actually the same native method at runtime. Generating native code for generic shared methods is not very complex since all reference types have the same size: 1 word. + +In order to extend generic sharing to valuetypes, we need to solve many problems. Take the following method: + +``` c +void swap (T[] a, int i, int j) +{ + var t = a [i]; + a [i] = a [j]; + a [j] = t; +} +``` + +Here, the size of 'T' is only known at runtime, so we don't know how much stack space to allocate for 't', or how much memory to copy from a \[i\] to t in the first assignment. + +For methods which contain their type parameters in their signatures, the situation is even more complex: + +``` c +public T return_t (T t) { + return t; +} +``` + +Here, the native signature of the method depends on its type parameter. One caller might call this as `return_t (1)`, passing in an int in one register, and expecting the result to be in the return register, while another might call this with a struct, passing it in registers and/or the stack, and expecting the result to be in a memory area whose address was passed in as an extra hidden parameter. + +## Basic implementation + +### Inside methods + +We refer to types which are type variables, or generic instances instantiated with type variables as 'gsharedvt types'. Types whose size depends on type variables are referred as 'variable types'. Since the size of variable types is only known at runtime, we cannot allocate static stack slots for them. Instead, we allocate a stack area for them at runtime using localloc, and dynamically compute their address when needed. The information required for this is stored in a `MonoGSharedVtMethodRuntimeInfo` structure. This structure is stored in an rgctx slot. At the start of the method, the following pseudo code is used to initialize the locals area: + +``` c +info_var = rgctx_fetch() +locals_var = localloc (info_var->locals_size) +``` + +Whenever an address of a variable sized locals is required, its computed using: + +``` c +locals_var + info_var->locals_offsets [] +``` + +Local variables are initialized using memset, and copied using memcpy. The size of the locals is fetched from the rgctx. So + +``` c +T a = b; +``` + +is compiled to: + +``` c +a_addr = locals_var + info_var->locals_offsets [] +b_addr = locals_var + info_var->locals_offsets [] +size = rgctx_fetch() +memcpy(a_addr, b_addr, size) +``` + +Methods complied with this type of sharing are called 'gsharedvt' methods. + +### Calling gsharedvt methods + +GSharedvt methods whose signature includes variable types use a different calling convention where gsharedvt arguments are passed by ref. + +``` c +foo(int,int,int,T) +``` + +is called using: + +``` c +foo(inti,int,int,T&) +``` + +The return value is returned using the same calling convention used to return large structures, i.e. by passing a hidden parameter pointing to a memory area where the method is expected to store the return value. + +When a call is made to a generic method from a normal method, the caller uses a signature with concrete types, i.e.: `return_t (1)`. If the callee is also a normal method, then there is no further work needed. However, if the callee is a gsharedvt method, then we have to transition between the signature used by the caller (int (int) in this case), and the signature used by the callee . This process is very low level and architecture specific. + +It typically involves reordering values in registers, stack slots etc. It is done by a trampoline called the gsharedvt trampoline. The trampoline receives a pointer to an info structure which describes the calling convention used by the caller and the callee, and the steps needed to transition between the two. The info structure is not passed by the caller, so we use another trampoline to pass the info structure to the trampoline: + +So a call goes: + +``` c + -> -> -> +``` + +The same is true in the reverse case, i.e. when the caller is a gsharedvt method, and the callee is a normal method. + +The info structure contains everything need to transfer arguments and make the call, this includes: + +- the callee address. +- an rgctx to pass to the callee. +- a mapping for registers and stack slots. +- whenever this in an 'in' or 'out' case. +- etc. + +As an example, here is what happens for the `return_t` case on ARM: + +- The caller passes in the argument in r0, and expects the return value to be in r0. + +- The callee receives the address of the int value in r1, and it receives the valuetype return address in r0. + +Here is the calling sequence: + +- The caller puts the value 1 in r0, then makes the call, which goes to the trampoline code. + +- The trampoline infrastructure detects that the call needs a gsharedvt trampoline. It computes the info structure holding the calling convention information, then creates a gsharedvt arg trampoline for it. + +- The gsharedvt arg trampoline is called, which calls the gsharedvt trampoline, passing the info structure as an argument. + +- The trampoline allocates a new stack frame, along with a 1 word area to hold the return value. + +- It receives the parameter value in r0, saves it into one of its stack slots, and passes the address of the stack slot in r1. + +- It puts the address of the return value into r0. + +- It calls the gsharedvt method. + +- The method copies the memory pointed to by r1 to the memory pointed to by r0, and returns to the trampoline. + +- The trampoline loads the return value from the return value area into r0 and returns to the caller. + +- The caller receives the return value in r0. + +For exception handling purposes, we create a wrapper method for the gsharedvt trampoline, so it shows up in stack traces, and the unwind code can unwind through it. There are two kinds of wrappers, 'in' and 'out'. 'in' wrappers handle calls made to gsharedvt methods from callers which use a variable signature, while 'out' wrappers handle calls made to normal methods from callers which use a variable signature. In later parts of this document, we use the term 'wrapper' to mean a gsharedvt arg trampoline. + +### Making calls out of gsharedvt methods + +#### Normal calls using a non-variable signature + +These are handed normally. + +#### Direct calls made using a variable signature + +These have several problems: + +- The callee might end up being a gsharedvt or a non-gsharedvt method. The former doesn't need a wrapper, the latter does. + +- The wrapper needs to do different things for different instantiations. This means that the call cannot be patched to go to a wrapper, since the wrapper is specific to one instantiation. + +To solve these problems, we make an indirect call through an rgctx entry. The rgctx entry resolver code determines what wrapper is needed, and patches the rgctx entry with the address of the wrapper, so later calls made from the gsharedvt method with the same instantiation will go straight to the wrapper. + +#### Virtual calls made using a variable signature + +Virtual methods have an extra complexity: there is only one vtable entry for a method, and it can be called by both normal and gsharedvt code. To solve this, when a virtual method is compiled as gsharedvt, we put an 'in' wrapper around it, and put the address of this wrapper into the vtable slot, instead of the method code. The virtual call will add an 'out' wrapper, so the call sequence will be: + +``` c + -> -> -> +``` + +## AOT support + +We AOT a gsharedvt version of every generic method, and use it at runtime if the specific instantiation of a method is not found. We also save the gsharedvt trampoline to the mscorlib AOT image, along with a bunch of gsharedvt arg trampolines. + +## Implementation details + +The gsharedvt version of a method is represented by inflating the method with type parameters, just like in the normal gshared case. To distinguish between the two, we use anon generic parameters whose `gshared_constraint` field is set to point to a valuetype. + +Relevant files/functions include: + +- `method-to-ir.c`: +- `mini-generic-sharing.c`: `instantiate_info ()`: This contains the code which handles calls made from gsharedvt methods through an rgctx entry. +- `mini-trampolines.c` `mini_add_method_trampolines ()`: This contains the code which handles calls made from normal methods to gsharedvt methods. +- `mini--gsharedvt.c`: `mono_arch_get_gsharedvt_call_info ()`: This returns the arch specific info structure passed to the gsharedvt trampoline. +- `tramp--gsharedvt.c`: `mono_arch_get_gsharedvt_trampoline ()`: This creates the gsharedvt trampoline. `mono_aot_get_gsharedvt_arg_trampoline ()`: This returns a gsharedvt arg trampoline which calls the gsharedvt trampoline passing in the info structure in an arch specific way. + +## Possible future work + +- Optimizations: + - Allocate the `info_var` and `locals_var` into registers. + - Put more information into the info structure, to avoid rgctx fetch calls. + - For calls made between gsharedvt methods, we add both an out and an in wrapper. This needs to be optimized so we only uses one wrapper in more cases, or create a more generalized wrapper, which can function as both an out and an in wrapper at the same time. +- The AOT complier tries to compile every instantiation which can be used at runtime. This leads to a lot of instantiations which are never used, and take up a lot of space. We might want to avoid generating some of these instantiations and use their gsharedvt versions instead. This is particularly true for methods where using the gsharedvt version might mean very little or no overhead. diff --git a/docs/design/mono/web/images/0911030528Mp6F5SHL.png b/docs/design/mono/web/images/0911030528Mp6F5SHL.png new file mode 100644 index 0000000000000..3f9d60715c2b4 Binary files /dev/null and b/docs/design/mono/web/images/0911030528Mp6F5SHL.png differ diff --git a/docs/design/mono/web/images/coop-state-machine.png b/docs/design/mono/web/images/coop-state-machine.png new file mode 100644 index 0000000000000..9d9596e967351 Binary files /dev/null and b/docs/design/mono/web/images/coop-state-machine.png differ diff --git a/docs/design/mono/web/images/igv-diff.png b/docs/design/mono/web/images/igv-diff.png new file mode 100644 index 0000000000000..61cba0025e6d7 Binary files /dev/null and b/docs/design/mono/web/images/igv-diff.png differ diff --git a/docs/design/mono/web/images/igv-screenshot.png b/docs/design/mono/web/images/igv-screenshot.png new file mode 100644 index 0000000000000..a14d97161bc9a Binary files /dev/null and b/docs/design/mono/web/images/igv-screenshot.png differ diff --git a/docs/design/mono/web/linear-ir.md b/docs/design/mono/web/linear-ir.md new file mode 100644 index 0000000000000..af650f57a9c79 --- /dev/null +++ b/docs/design/mono/web/linear-ir.md @@ -0,0 +1,318 @@ +# Linear IR + +This document describes Mono's new JIT engine based on a rewrite to use a linear Intermediate Representation instead of the tree-based intermediate representation that was used up to Mono 2.0. + +You might also want to check [Mono's Runtime Documentation](/docs/advanced/runtime/docs/). + +Intermediate Representation (IR) +-------------------------------- + +The IR used by the JIT is standard three address code: + +OP dreg \<- sreg1 sreg2 + +Here dreg, sreg1, sreg2 are virtual registers (vregs). OP is an opcode. For example: + + int_add R5 <- R6 R7 + +### Opcodes + +The opcodes used by the JIT are defined in the [mini-ops.h](https://github.com/mono/mono/blob/main/mono/mini/mini-ops.h) file. Each opcode has a value which is a C constant, a name, and some metadata containing information about the opcode like the type of its arguments and its return value. An example: + + MINI_OP(OP_IADD, "int_add", IREG, IREG, IREG) + +The opcodes conform to the following naming conventions: + +- CEE\_... opcodes are the original opcodes defined in the IL stream. The first pass of the JIT transforms these opcodes to the corresponding OP\_ opcodes so CEE\_ opcodes do not occur in the intermediate code. Correspondingly, they have no opcode metadata, and are not listed in mini-ops.h. +- OP_\ opcodes are either size agnostic, like OP_THROW, or operate on the natural pointer size of the machine, ie. OP_ADD adds two pointer size integers. +- OP_I\ opcodes work on 32 bit integers, ie. vregs of type STACK_I4. +- OP_L\ opcodes work on 64 bit integers, ie. vregs of type STACK_I8. +- OP_F\ opcodes work on 64 bit floats, i.e. vregs of type STACK_R8. +- OP_V\ opcodes work on valuetypes. +- OP_P\ opcodes are macros which map to either OP_I\ or OP_L\ opcodes depending on whenever the architecture is 32 or 64 bits. + +### High/low level IR + +\<......\> + +### Representation of IR instructions + +Each IR instruction is represented by a MonoInst structure. The fields of the structure are used as follows: + +- ins-\>opcode contains the opcode of the instruction. It is always set. + +- ins-\>dreg, ins-\>sreg1, ins-\>sreg2 contain the the destination and source vregs of the instruction. If the instruction doesn't have a destination/and our source, the corresponding field is set to -1. + +- ins-\>backend is used for various purposes: + - for MonoInst's representing vtype variables, it indicates that the variable is in unmanaged format (used during marshalling) + - instructions which operate on a register pair use it for storing the third input register of the instruction. + - some opcodes, like X86_LEA use it for storing auxiliary information + +- ins-\>next and ins-\>prev are used for linking the instructions. + +- ins-\>ssa_op -\> not used anymore + +- ins-\>cil_code -\> Points to the IL instructions this ins belongs to. Used for generating native offset-\> IL offset maps for debugging support. + +- ins-\>flags is used for storing various flags + +- ins-\>type and ins-\>klass contain type information for the result of the instruction. These fields are only used during the method_to_ir () pass. + +In addition to the fields above, each MonoInst structure contains two pointer sized fields which can be used by the instruction for storing arbitrary data. They can be accessed using a set of inst_\ macros. + +Some guidelines for their usage are as follows: + +- OP_\_IMM macros store their immediate argument in inst_imm. +- OP_\_MEMBASE macros store the basereg in inst_basereg (sreg1), and the displacement in inst_offset. +- OP_STORE\_MEMBASE macros store the basereg in inst_destbasereg (dreg), and the displacement in inst_offset. This has historical reasons since the dreg is not modified by the instruction. + +Virtual Registers (Vregs) +------------------------- + +All IR instructions work on vregs. A vreg is identified by an index. A vreg also has a type, which is one of the MonoStackType enumeration values. This type is implicit, i.e. it is not stored anywhere. Rather, the type can be deduced from the opcodes which work on the vreg, i.e. the arguments of the OP_IADD opcode are of type STACK_I4. + +There are two types of vregs used inside the JIT: Local and Global. They have the following differences: + +### Local Vregs (lvreg) + +- are local to a basic block +- are lightweight: allocating an lvreg is equivalent to increasing a counter, and they don't consume any memory. +- some optimization passes like local_deadce operate only on local vregs +- local vregs are assigned to hard registers (hregs) by the local register allocator. They do not participate in liveness analysis, and in global register allocation. +- they have no address, i.e. it is not possible to take their address +- they cannot be volatile + +### Global Vregs + +- are heavyweight: allocating them is slower, and they consume memory. Each global vreg has an entry in the cfg-\>varinfo and cfg-\>vars arrays. +- global vregs are either allocated to hard registers during global register allocation, or are allocated to stack slots. +- they have an address, so it is possible to apply the LDADDR operator to them. +- The mapping between global vregs and their associated entry in the cfg-\>varinfo array is done by the cfg-\>vreg_to_inst array. There is a macro called get_vreg_to_inst () which indexes into this array. A vreg vr is global if get_vreg_to_inst (cfg, vr) returns non NULL. + +### Motivation + +The JIT allocates a large number of vregs. Most of these are created during the MSIL-\>IR phase, and represent the contents of the evaluation stack. By treating these vregs specially, we don't need to allocate memory for them, and don't need to include them in expensive optimization passes like liveness analysis. Also, lvregs enable the use of local versions of classic optimization passes, like copy/constant propagation and dead code elimination, which are much faster than their global counterparts, and thus can be included in the default optimization set of a JIT compiler. + +### Transitioning between the two states + +- Most vregs start out being local. Others, like the ones representing the arguments and locals of a method, start out being global. +- Some transformations done by the JIT can break the invariant that an lvreg is local to a basic block. There is a separate pass, mono_handle_global_vregs (), which verifies this invariant and transforms lvregs into global vregs if neccesary. This pass also does the opposite transformation, by transforming global vregs used only in one bblock into an lvreg. +- If an address of a vreg needs to be taken, the vreg is transformed into a global vreg. + +JIT Passes +---------- + +### Method-to-IR + +This is the first pass of the JIT, and also the largest. Its job is to convert the IL code of the method to our intermediate representation. Complex opcodes like isinst are decomposed immediately. It also performs verification in parallel. The code is in the function method_to_ir () in method-to-ir.c. + +### Decompose-Long-Opts + +This pass is responsible for decomposing instructions operating on longs on 32 bit platforms as described in the section 'Handling longs on 32 bit machines'. This pass changes the CFG of the method by introducing new bblocks. It resides in the mono_decompose_long_opts () function in decompose.c. + +### Local Copy/Constant Propagation + +This pass performs copy and constant propagation on single bblocks. It works by making a linear pass over the instructions inside a basic block, remembering the instruction where each vreg was defined, and using this knowledge to replace references to vregs by their definition if possible. It resides in the mono_local_cprop2 () function in local-propagation.c. This pass can run anytime. Currently, it is executed twice: + +- Just after the method-to-ir pass to clean up the many redundant copies generated during the initial conversion to IR. +- After the spill-global-vars pass to optimize the loads/stores created by that pass. + +### Branch Optimizations + +This pass performs a variety of simple branch optimizations. It resides in the optimize_branches () function in mini.c. + +This pass runs after local-cprop since it can use the transformations generated in that pass to eliminate conditional branches. + +### Handle-Global-Vregs + +This pass is responsible for promoting vregs used in more than one basic block into global vregs. It can also do the opposite transformation, i.e. it can denote global vregs used in only one basic block into local ones. It resides in the mono_handle_global_vregs () function in method-to-ir.c. + +This pass must be run before passes that need to distinguish between global and local vregs, i.e. local-deadce. + +### Local Dead Code Elimination + +This pass performs dead code elimination on single basic blocks. The instructions inside a basic block are processed in reverse order, and instructions whose target is a local vreg which is not used later in the bblock are eliminated. + +This pass mostly exists to get rid of the instructions made unnecessary by the local-cprop pass. + +This pass must be run after the handle-global-vregs pass since it needs to distinguish between global and local vregs. + +### Decompose VType Opts + +This pass is responsible for decomposing valuetype operations into simpler operations, as described in the section 'Handling valuetypes'. It resides in the mono_decompose_vtype_opts () function in decompose.c. + +This pass can be run anytime, but it should be run as late as possible to enable vtype opcodes to be optimized by the local and SSA optimizations. + +### SSA Optimizations + +These optimizations consists of: + +- transformation of the IR to SSA form +- optimizations: deadce, copy/constant propagation +- transformation out of SSA form + +### Liveness Analysis + +This pass is responsible for calculating the liveness intervals for all global vregs using a classical backward dataflow analysis. It is usually the most expensive pass of the JIT especially for large methods with lots of variables and basic blocks. It resides in the liveness.c file. + +### Global Register Allocation + +This pass is responsible for allocating some vregs to one of the hard registers available for global register allocation. It uses a linear scan algorithm. It resides in the linear-scan.c file. + +### Allocate Vars + +This arch-specific function is responsible for allocating all variables (or global vregs) to either a hard reg (as determined during global register allocation) or to a stack location. It depends on the mono_allocate_stack_slots () function to allocate stack slots using a linear scan algorithm. + +### Spill Global Vars + +This pass is responsible for processing global vregs in the IR. Vregs which are assigned to hard registers are replaced with the given registers. Vregs which are assigned to stack slots are replaced by local vregs and loads/stores are generated between the local vreg and the stack location. In addition, this pass also performs some optimizations to minimalize the number of loads/stores added, and to fold them into the instructions themselves on x86/amd64. It resides in the mono_spill_global_vars () function in method-to-ir.c. + +This pass must be run after the allocate_vars () pass. + +Handling longs on 32 bit machines +--------------------------------- + +On 32 bit platforms like x86, the JIT needs to decompose opcodes operating on longs into opcodes operating on ints. This is done as follows: + +- When a vreg of type 'long' is allocated, two consecutive vregs of type 'int' are allocated. These two vregs represent the most significant and less-significant word of the long value. +- In the decompose-long-opts pass, all opcodes operating on longs are replaced with opcodes operating on the component vregs of the original long vregs. I.e. + + + + R11 <- LOR R21 R31 + is replaced with: + R12 <- IOR R22 R32 + R13 <- IOR R23 R33 + +- Some opcodes, like OP_LCALL can't be decomposed so they are retained in the IR. This leads to some complexity since other parts of the JIT has to be prepared to deal with long vregs. + +Handling valuetypes +------------------- + +Valuetypes are first class citizens in the IR, i.e. there are opcodes operating on valuetypes, there are vtype vregs etc. This is done to allow the local and SSA optimizations to be able to work on valuetypes too, and to simplify other parts of the JIT. The decompose-vtype-opts pass is responsible for decomposing vtype opcodes into simpler ones. One of the most common operations on valuetypes is taking their address. Taking the address of a variable causes it to be ignored by most optimizations, so the JIT tries to avoid it if possible, for example using a VZERO opcode for initialization instead of LDADDR+INITOBJ etc. LDADDR opcodes are generated during the decompose-vtype-opts pass, but that pass is executed after all the other optimizations, so it is no longer a problem. Another complication is the fact the vregs have no type, which means that vtype opcodes have to have their ins-\>klass fields filled in to indicate the type which they operate on. + +Porting an existing backend to the new IR +----------------------------------------- + +- Add the following new functions: + - mono_arch_emit_call (). Same as mono_arch_call_opcode (), but emits IR for pushing arguments to the stack. All the stuff in mono_arch_emit_this_vret_args () should be done in emit_call () too. + - mono_arch_emit_outarg_vt (). Emits IR to push a vtype to the stack + - mono_arch_emit_setret (). Emits IR to move its argument to the proper return register + - mono_arch_emit_inst_for_method (). Same as mono_arch_get_inst_for_method, but also emits the instructions. + +- Add new opcodes to cpu-\.md and mono_arch_output_basic_block (): + - dummy_use, dummy_store, not_reached + - long_bCC and long_cCC opcodes + - cond_exc_iCC opcodes + - lcompare_imm == op_compare_imm + - int_neg == neg + - int_not == not + - int_convXX == conv.iXX + - op_jump_table + - long_add == cee_add (on 64 bit platforms) + - op_start_handler, op_endfinally, op_endfilter +- In mono_arch_create_vars, when the result is a valuetype, it needs to create a new variable to represent the hidden argument holding the vtype return address and store this variable into cfg-\>vret_addr. +- Also, in mono_arch_allocate_vars, when the result is a valuetype, it needs to setup cfg-\>vret_addr instead of cfg-\>ret. + +For more info, compare the already converted backends like x86/amd64/ia64 with their original versions in HEAD. For example: [[1]](https://lists.dot.net/pipermail/mono-patches/2006-April/073170.html) + +Benchmark results +----------------- + +All the benchmarks were run on an amd64 machine in 64 bit mode. + +- pnetmark score: + + + + current JIT: 19725 + linear IR: 24970 (25% faster) + +- mini/bench.exe: + + + + current JIT: 2.183 secs + linear IR: 1.972 secs (10% faster) + +- corlib 2.0 compile: + + + + current JIT: 9.421 secs + linear IR: 9.223 secs (2% faster) + +- ziptest.exe from [https://bugzilla.novell.com/show_bug.cgi?id=342190](https://bugzilla.novell.com/show_bug.cgi?id=342190) on the zerofile.bin input file: + + + + current JIT: 18.648 secs + linear IR: 9.934 secs (50% faster) + +- decimal arithmetic benchmark from [https://lists.dot.net/pipermail/mono-devel-list/2008-May/028061.html](https://lists.dot.net/pipermail/mono-devel-list/2008-May/028061.html): + + + + current JIT: + addition 3774.094 ms + substraction 3644.657 ms + multiplication 2959.355 ms + division 61897.441 ms + linear IR: + addition 3096.526 ms + substraction 3065.364 ms + multiplication 2270.676 ms + division 60514.169 ms + +- IronPython pystone.py 5000000 iterations: + + + + current JIT: 69255.7 pystones/second + linear IR: 83187.8 pystones/second (20% faster) + + All the code size tests were measured using `mono --stats --compile-all ` + +- corlib 1.0 native code size: + + + + current JIT: 2100173 bytes + linear IR: 1851966 bytes (12% smaller) + +- mcs.exe native code size: + + + + current JIT: 1382372 bytes + linear IR: 1233707 bytes (11% smaller) + +- all 1.0 assemblies combined: + + + + current JIT: 15816800 bytes + linear IR: 12774991 bytes (20% smaller) + +Improvements compared to the Mono 1.x and Mono 2.0 JITs +------------------------------------------------------- + +- The old JIT used trees as its internal representation, and the only thing which was easy with trees was code generation, everything else is hard. With the linear IR, most things are easy, and only a few things are hard, like optimizations which transform multiple operations into one, like transforming a load+operation+store into an operation taking a memory operand on x86. + +- Since there is only one IR instead of two, the new JIT is (hopefully) easier to understand and modify. + +- There is an if-conversion pass which can convert simple if-then-else statements to predicated instructions on x86/64, eliminating branches. + +- Due to various changes, the ABCREM pass can eliminate about twice as many array bound checks in corlib as the current JIT. It was also extended to eliminate redundant null checks. + +- Handling of valuetypes is drastically improved, including: + - allowing most optimization passes like constant and copy propagation to work on valuetypes. + - elimination of redundant initialization code inserted because of the initlocals flag. + - elimination of many redundant copies when the result of a call is passed as an argument to another call. + - passing and returning small valuetypes in registers on x86/amd64. + +- Due to the elimination of the tree format, it is much easier to generate IR code for complex IL instructions. Some things, like branches, which are almost impossible to generate in the current JIT in the method_to_ir () pass, can be generated easily. + +- The handling of soft-float on ARM is done in a separate pass instead of in a miriad places, hopefully getting rid of bugs in this area. + +- In the old representation the tree to code transformations were easy only if the "expression" to transform was represented as a tree. If, for some reason, the operation was "linearized", using local variables as intermediate results instead of the tree nodes, then the optimization simply did not take place. Or the jit developer had to code twice: once for the tree case and once for the "linear" case. diff --git a/docs/design/mono/web/llvm-backend.md b/docs/design/mono/web/llvm-backend.md new file mode 100644 index 0000000000000..4fcb4810e2bbe --- /dev/null +++ b/docs/design/mono/web/llvm-backend.md @@ -0,0 +1,220 @@ +# LLVM Backend + +Mono includes a backend which compiles methods to native code using LLVM instead of the built in JIT. + +Usage +----- + +The back end requires the usage of our LLVM fork/branches, see 'The LLVM Mono Branch' section below. + +The llvm back end can be enabled by passing `--enable-llvm=yes` or `--with-llvm=` to configure. + +Platform support +--------------- + +LLVM is currently supported on x86, amd64, arm and arm64. + +Architecture +------------ + +The backend works as follows: + +- first, normal mono JIT IR is generated from the IL code +- the IR is transformed to SSA form +- the IR is converted to the LLVM IR +- the LLVM IR is compiled by LLVM into native code + +LLVM is accessed through the LLVM C binding. + +The backend doesn't currently support all IL features, like vararg calls. Methods using such features are compiled using the normal mono JIT. Thus LLVM compiled and JITted code can coexist in the same process. + +Sources +------- + +The backend is in the files mini-llvm.c and mini-llvm-cpp.cpp. The former contains the bulk of the backend, while the latter contains c++ code which is needed because of deficiencies in the LLVM C binding which the backend uses. + +The LLVM Mono Branch +-------------------- + +We maintain a fork/branch of LLVM with various changes to enable better integration with mono. The repo is at: + +[https://github.com/dotnet/llvm-project](https://github.com/dotnet/llvm-project) + +The LLVM backend is currently only supported when using this version of LLVM. When using this version, it can compile about 99% of mscorlib methods. + +### Changes relative to stock LLVM + +The branch currently contains the following changes: + +- additional mono specific calling conventions. +- support for loads/stores which can fault using LLVM intrinsics. +- support for saving the stack locations of some variables into the exception handling info emitted by LLVM. +- support for stores into TLS on x86. +- the LLVM version string is changed to signal that this is a branch, i.e. it looks like "2.8svn-mono". +- workarounds to force LLVM to generate direct calls on amd64. +- support for passing a blockaddress value as a parameter. +- emission of EH/unwind info in a mono-specific compact format. + +The changes consist of about 1.5k lines of code. The majority of this is the EH table emission. + +### Branches + +- `release/6.x` and `release/9.x` contain our changes + +### Maintaining the repository + +The `release/*` branches are maintained by regularly rebasing them on top of upstream. This makes examining our changes easier. To merge changes from upstream to this repo, do: + +``` bash +git remote add upstream https://github.com/llvm/llvm-project.git +git fetch upstream +git rebase upstream/ + +git push origin +``` + +Due to the rapid pace of development, and the frequent reorganization/refactoring of LLVM code, merge conflicts are pretty common, so maintaining our fork is time consuming. A subset of our changes can probably be submitted to upstream LLVM, but it would require some effort to clean them up, document them, etc. + +Restrictions +------------ + +There are a number of constructs that are not supported by the LLVM backend. In those cases the Mono code generation engine will fall back to Mono's default compilation engine. + +### Exception Handlers + +Nested exception handlers are not supported because of the differences in sematics between mono's exception handling the c++ abi based exception handling used by LLVM. + +### Varargs + +These are implemented using a special calling convention in mono, i.e. passing a hidden 'signature cookie' argument, and passing all vararg arguments on the stack. LLVM doesn't support this calling convention. + +It might be possible to support this using the [LLVM vararg intrinsics](http://llvm.org/docs/LangRef.html#int_varargs). + +### save_lmf + +Wrapper methods which have method->save_lmf set are not yet supported. + +### Calling conventions + +Some complicated parameter passing conventions might not be supported on some platforms. + +Implementation details +---------------------- + +### Virtual calls + +The problem here is that the trampoline handing virtual calls needs to be able to obtain the vtable address and the offset. This is currently done by an arch specific function named mono_arch_get_vcall_slot_addr (), which works by disassembling the calling code to find out which register contains the vtable address. This doesn't work for LLVM since we can't control the format of the generated code, so disassembly would be very hard. Also, sometimes the code generated by LLVM is such that the vtable address cannot be obtained at all, i.e.: + + mov %rax, (%rax) + call %rax + +To work around these problems, we use a separate vtable trampoline for each vtable slot index. The trampoline obtains the 'this' argument from the registers/stack, whose location is dicated by the calling convention. The 'this' argument plus the slot index can be used to compute the vtable slot and the called method. + +### Interface calls + +The problem here is that these calls receive a hidden argument called the IMT argument which is passed in a non-ABI register by the JIT, which cannot be done with LLVM. So we call a trampoline instead, which sets the IMT argument, then makes the virtual call. + +### Unwind info + +The JIT needs unwind info to unwind through LLVM generated methods. This is solved by obtaining the exception handling info generated by LLVM, then extracting the unwind info from it. + +### Exception Handling + +Methods with exception clauses are supported, altough there are some corner cases in the class library tests which still fail when ran with LLVM. + +LLVM uses the platform specific exception handling abi, which is the c++ ehabi on linux, while we use our home grown exception handling system. To make these two work together, we only use one LLVM EH intrinsic, the llvm.eh.selector intrinsic. This will force LLVM to generate exception handling tables. We decode those tables in mono_unwind_decode_fde () to obtain the addresses of the try-catch clauses, and save those to MonoJitInfo, just as with JIT compiled code. Finally clauses are handled differently than with JITted code. Instead of calling them from mono_handle_exception (), we save the exception handling state in TLS, then branch to them the same way we would branch to a catch handler. the code generated from ENDFINALLY will call mono_resume_unwind (), which will resume exception handling from the information saved in TLS. + +LLVM doesn't support implicit exceptions thrown by the execution of instructions. An implicit exception is for example a NullReferenceException that would be raised when you access an invalid memory location, typically in Mono and .NET, an uninitialized pointer. + +Implicit exceptions are implemented by adding a bunch of LLVM intrinsics to do loads/stores, and calling them using the LLVM 'invoke' instruction. + +Instead of generating DWARF/c++ EHABI exception handling tables, we generate our own tables using a mono specific format, which the mono runtime reads during execution. This has the following advantages: + +- the tables are compact and take up less space. +- we can generate a lookup table similar to .eh_frame_hdr which is normally generated by the linker, allowing us to support macOS/iOS, since the apple linker doesn't support .eh_frame_hdr. +- the tables are pointed to by a normal global symbol, instead of residing in a separate segment, whose address cannot be looked up under macOS. + +### Generic Sharing + +There are two problems here: passing/receiving the hidden rgctx argument passed to some shared methods, and obtaining its value/the value of 'this' during exception handling. + +The former is implemented by adding a new mono specific calling convention which passes the 'rgctx' argument in the non-ABI register where mono expects it, i.e. R10 on amd64. The latter is implemented by marking the variables where these are stored with a mono specific LLVM custom metadata, and modifying LLVM to emit the final stack location of these variables into the exception handling info, where the runtime can retrieve it. + +AOT Support +----------- + +This is implemented by emitting the LLVM IR into a LLVM bytecode file, then using the LLVM llc compiler to compile it, producing a .s file, then we append our normal AOT data structures, plus the code for methods not supported by LLVM to this file. + +A runtime which is not configured by --enable-llvm=yes can be made to use LLVM compiled AOT modules by using the --llvm command line argument: mono --llvm hello.exe + +Porting the backend to new architectures +---------------------------------------- + +The following changes has to be made to port the LLVM backend to a new architecture: + +- Define MONO_ARCH_LLVM_SUPPORTED in mini-\.h. +- Implement mono_arch_get_llvm_call_info () in mini-\.h. This function is a variant of the arch specific get_call_info () function, it should return calling convention information for a signature. +- Define MONO_CONTEXT_SET_LLVM_EXC_REG() in mini-\.h to the register used to pass the exception object to LLVM compiled landing pads. This is usually defined by the platform ABI. +- Implement the LLVM exception throwing trampolines in exceptions-\.c. These trampolines differ from the normal ones because they receive the PC address of the throw site, instead of a displacement from the start of the method. See exceptions-amd64.c for an example. +- Implement the resume_unwind () trampoline, which is similar to the throw trampolines, but instead of throwing an exception, it should call mono_resume_unwind () with the constructed MonoContext. + +LLVM problems +------------- + +Here is a list of problems whose solution would probably require changes to LLVM itself. Some of these problems are solved in various ways by changes on the LLVM Mono branch. + +- the llvm.sqrt intrinsic doesn't work with NaNs, even through the underlying C function/machine instruction probably works with them. Worse, an optimization pass transforms sqrt(NaN) to 0.0, changing program behaviour, and masking the problem. +- there is no fabs intrinsic, instead llc seems to replace calls to functions named 'fabs' with the corresponding assembly, even if they are not the fabs from libm ? +- There is no way to tell LLVM that a result of a load is constant, i.e. in a loop like this: + + + + for (int i = 0; i < arr.Length; ++i) + arr [i] = 0 + +The arr.Length load cannot be moved outside the loop, since the store inside the loop can alias it. There is a llvm.invariant.start/end intrinsic, but that seems to be only useful for marking a memory area as invariant inside a basic block, so it cannot be used to mark a load globally invariant. + +[http://hlvm.llvm.org/bugs/show_bug.cgi?id=5441](http://hlvm.llvm.org/bugs/show_bug.cgi?id=5441) + +- LLVM has no support for implicit exceptions: + +[http://llvm.org/bugs/show_bug.cgi?id=1269](http://llvm.org/bugs/show_bug.cgi?id=1269) + +- LLVM thinks that loads from a NULL address lead to undefined behaviour, while it is quite well defined on most unices (SIGSEGV signal being sent). If an optimization pass determines that the source address of a load is NULL, it changes it to undef/unreachable, changing program behaviour. The only way to work around this seems to be marking all loads as volatile, which probably doesn't help optimizations. +- There seems to be no way to disable specific optimizations when running 'opt', i.e. do -std-compile-opts except tailcallelim. +- The x86 JIT seems to generate normal calls as + + + + mov reg, imm + call *reg + +This makes it hard/impossible to patch the calling address after the called method has been compiled. \ [http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-December/027999.html](http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-December/027999.html) + +- LLVM Bugs: [[1]](http://llvm.org/bugs/show_bug.cgi?id=6102) + +Future Work +----------- + +### Array Bounds Check (ABC) elimination + +Mono already contains a ABC elimination pass, which is fairly effective at eliminating simple bounds check, i.e. the one in: + +for (int i = 0; i \< arr.Length; ++i) + + sum += arr [i]; + +However, it has problems with "partially redundant" check, i.e. checks which cannot be proven to be reduntant, but they are unlikely to be hit at runtime. With LLVM's extensive analysis and program transformation passes, it might be possible to eliminate these from loops, by changing them to loop-invariant checks and hoisting them out of loops, i.e. changing: + + for (int i = 0; i < len; ++i) + sum += arr [i]; + +to: + + if (len < arr.Length) { + + } else { + + } + +LLVM has a LoopUnswitch pass which can do something like this for constant expressions, it needs to be extended to handle the ABC checks too. Unfortunately, this cannot be done currently because the arr.Length instruction is converted to a volatile load by mono's LLVM backend, since it can fault if arr is null. This means that the load is not loop invariant, so it cannot be hoisted out of the loop. diff --git a/docs/design/mono/web/memory-management.md b/docs/design/mono/web/memory-management.md new file mode 100644 index 0000000000000..75755969163a4 --- /dev/null +++ b/docs/design/mono/web/memory-management.md @@ -0,0 +1,48 @@ +# Memory Management + +Metadata memory management +-------------------------- + +Most metadata structures have a lifetime which is equal to the MonoImage where they are loaded from. These structures should be allocated from the memory pool of the corresponding MonoImage. The memory pool is protected by the loader lock. Examples of metadata structures in this category: + +- MonoClass +- MonoMethod +- MonoType + +Memory owned by these structures should be allocated from the image mempool as well. Examples include: klass-\>methods, klass-\>fields, method-\>signature etc. + +Generics complicates things. A generic class could have many instantinations where the generic arguments are from different assemblies. Where should we allocate memory for instantinations ? We can allocate from the mempool of the image which contains the generic type definition, but that would mean that the instantinations would remain in memory even after the assemblies containing their type arguments are unloaded, leading to a memory leak. Therefore, we do the following: + +- data structures representing the generic definitions are allocated from the image mempool as usual. These include: + + + + * generic class definition (MonoGenericClass->container_class) + * generic method definitions + * type parameters (MonoGenericParam) + +- data structures representing inflated classes/images are allocated from the heap. They are owned by an 'image-set' which is the set of all images they depend on. When an image is unloaded, all image-sets it belongs to are freed, causing the data structures owned by the image-sets to be freed too. The structures handled this way include: + + + + * MonoGenericClass + * MonoGenericInst + * inflated MonoMethods + +[Original version of this document in git.](https://github.com/mono/mono/blob/425844619cbce18eaa64205b9007f0c833e4a5c4/docs/memory-management.txt) + +Memory management for executable code +------------------------------------- + +Executable code is managed using 'code-managers', whose implementation is in utils/mono-codeman.{h,c}. These allow the allocation of memory which is suitable for storing executable code, i.e.: + +- It has the required executable (x) permission. +- The alignment of the memory blocks allocated from the code manager matches the preferred function alignment of the platform. + +Code managers also allow a certain percent of the memory they manage to be reserved for storing things like function thunks. + +The runtime contains the following code managers: + +- There is a global code manager declared in mini.c which is used to manage code memory whose lifetime is equal to the lifetime of the runtime. Memory for trampolines is allocated from the global code manager. +- Every domain has a code manager which is used for allocating memory used by JITted code belonging to that domain. +- Every 'dynamic' method, i.e. a method whose lifetime is not equal to the runtime or a domain, has its own code manager. diff --git a/docs/design/mono/web/mini-porting.md b/docs/design/mono/web/mini-porting.md new file mode 100644 index 0000000000000..8d3977982ce54 --- /dev/null +++ b/docs/design/mono/web/mini-porting.md @@ -0,0 +1,373 @@ +# Porting the Engine + +## Introduction + +This documents describes the process of porting the mono JIT to a new CPU architecture. The new mono JIT has been designed to make porting easier though at the same time enable the port to take full advantage from the new architecture features and instructions. Knowledge of the mini architecture (described in the mini-doc.txt file) is a requirement for understanding this guide, as well as an earlier document about porting the mono interpreter (available on the web site). + +There are six main areas that a port needs to implement to have a fully-functional JIT for a given architecture: + +- instruction selection +- native code emission +- call conventions and register allocation +- method trampolines +- exception handling +- minor helper methods + +To take advantage of some not-so-common processor features (for example conditional execution of instructions as may be found on ARM or ia64), it may be needed to develop an high-level optimization, but doing so is not a requirement for getting the JIT to work. + +We'll see in more details each of the steps required, note, though, that a new port may just as well start from a cut and paste of an existing port to a similar architecture (for example from x86 to amd64, or from powerpc to sparc). + +The architecture specific code is split from the rest of the JIT, for example the x86 specific code and data is all included in the following files in the distribution: + +mini-x86.h mini-x86.c inssel-x86.brg cpu-pentium.md tramp-x86.c exceptions-x86.c + +I suggest a similar split for other architectures as well. + +Note that this document is still incomplete: some sections are only sketched and some are missing, but the important info to get a port going is already described. + +## Architecture-specific instructions and instruction selection + +The JIT already provides a set of instructions that can be easily mapped to a great variety of different processor instructions. Sometimes it may be necessary or advisable to add a new instruction that represent more closely an instruction in the architecture. Note that a mini instruction can be used to represent also a short sequence of CPU low-level instructions, but note that each instruction represents the minimum amount of code the instruction scheduler will handle (i.e., the scheduler won't schedule the instructions that compose the low-level sequence as individual instructions, but just the whole sequence, as an indivisible block). + +New instructions are created by adding a line in the mini-ops.h file, assigning an opcode and a name. To specify the input and output for the instruction, there are two different places, depending on the context in which the instruction gets used. + +If an instruction is used as a low-level CPU instruction, the info is specified in a machine description file. The description file is processed by the genmdesc program to provide a data structure that can be easily used from C code to query the needed info about the instruction. + +As an example, let's consider the add instruction for both x86 and ppc: + + x86 version: + add: dest:i src1:i src2:i len:2 clob:1 + ppc version: + add: dest:i src1:i src2:i len:4 + +Note that the instruction takes two input integer registers on both CPU, but on x86 the first source register is clobbered (clob:1) and the length in bytes of the instruction differs. + +Note that integer adds and floating point adds use different opcodes, unlike the IL language (64 bit add is done with two instructions on 32 bit architectures, using a add that sets the carry and an add with carry). + +A specific CPU port may assign any meaning to the clob field for an instruction since the value will be processed in an arch-specific file anyway. + +See the top of the existing cpu-pentium.md file for more info on other fields: the info may or may not be applicable to a different CPU, in this latter case the info can be ignored. + +So, one of the first things needed in a port is to write a cpu-$(arch).md machine description file and fill it with the needed info. As a start, only a few instructions can be specified, like the ones required to do simple integer operations. The default rules of the instruction selector will emit the common instructions and so we're ready to go for the next step in porting the JIT. + +## Native code emission + +Since the first step in porting mono to a new CPU is to port the interpreter, there should be already a file that allows the emission of binary native code in a buffer for the architecture. This file should be placed in the + +``` bash + mono/arch/$(arch)/ +``` + +directory. + +The bulk of the code emission happens in the mini-$(arch).c file, in a function called `mono_arch_output_basic_block ()`. This function takes a basic block, walks the list of instructions in the block and emits the binary code for each. Optionally a peephole optimization pass is done on the basic block, but this can be left for later, when the port actually works. + +This function is very simple, there is just a big switch on the instruction opcode and in the corresponding case the functions or macros to emit the binary native code are used. Note that in this function the lengths of the instructions are used to determine if the buffer for the code needs enlarging. + +To complete the code emission for a method, a few other functions need implementing as well: + +``` c + mono_arch_emit_prolog () + mono_arch_emit_epilog () + mono_arch_patch_code () +``` + +`mono_arch_emit_prolog ()` will emit the code to setup the stack frame for a method, optionally call the callbacks used in profiling and tracing, and move the arguments to their home location (in a caller-save register if the variable was allocated to one, or in a stack location if the argument was passed in a volatile register and wasn't allocated a non-volatile one). caller-save registers used by the function are saved in the prolog as well. + +`mono_arch_emit_epilog ()` will emit the code needed to return from the function, optionally calling the profiling or tracing callbacks. At this point the basic blocks or the code that was moved out of the normal flow for the function can be emitted as well (this is usually done to provide better info for the static branch predictor). In the epilog, caller-save registers are restored if they were used. + +Note that, to help exception handling and stack unwinding, when there is a transition from managed to unmanaged code, some special processing needs to be done (basically, saving all the registers and setting up the links in the Last Managed Frame structure). + +When the epilog has been emitted, the upper level code arranges for the buffer of memory that contains the native code to be copied in an area of executable memory and at this point, instructions that use relative addressing need to be patched to have the right offsets: this work is done by `mono_arch_patch_code ()`. + +## Call conventions and register allocation + +To account for the differences in the call conventions, a few functions need to be implemented. + +`mono_arch_allocate_vars ()` assigns to both arguments and local variables the offset relative to the frame register where they are stored, dead variables are simply discarded. The total amount of stack needed is calculated. + +`mono_arch_call_opcode ()` is the function that more closely deals with the call convention on a given system. For each argument to a function call, an instruction is created that actually puts the argument where needed, be it the stack or a specific register. This function can also re-arrange th order of evaluation when multiple arguments are involved if needed (like, on x86 arguments are pushed on the stack in reverse order). The function needs to carefully take into accounts platform specific issues, like how structures are returned as well as the differences in size and/or alignment of managed and corresponding unmanaged structures. + +The other chunk of code that needs to deal with the call convention and other specifics of a CPU, is the local register allocator, implemented in a function named `mono_arch_local_regalloc ()`. The local allocator deals with a basic block at a time and basically just allocates registers for temporary values during expression evaluation, spilling and unspilling as necessary. + +The local allocator needs to take into account clobbering information, both during simple instructions and during function calls and it needs to deal with other architecture-specific weirdnesses, like instructions that take inputs only in specific registers or output only is some. + +Some effort will be put later in moving most of the local register allocator to a common file so that the code can be shared more for similar, risc-like CPUs. The register allocator does a first pass on the instructions in a block, collecting liveness information and in a backward pass on the same list performs the actual register allocation, inserting the instructions needed to spill values, if necessary. + +The cross-platform local register allocator is now implemented and it is documented in the jit-regalloc file. + +When this part of code is implemented, some testing can be done with the generated code for the new architecture. Most helpful is the use of the --regression command line switch to run the regression tests (basic.cs, for example). + +Note that the JIT will try to initialize the runtime, but it may not be able yet to compile and execute complex code: commenting most of the code in the `mini_init()` function in mini.c is needed to let the JIT just compile the regression tests. Also, using multiple -v switches on the command line makes the JIT dump an increasing amount of information during compilation. + +Values loaded into registers need to be extended as needed by the ECMA specs: + +- integers smaller than 4 bytes are extended to int32 values +- 32 bit floats are extended to double precision (in particular this means that currently all the floating point operations operate on doubles) + +## Method trampolines + +To get better startup performance, the JIT actually compiles a method only when needed. To achieve this, when a call to a method is compiled, we actually emit a call to a magic trampoline. The magic trampoline is a function written in assembly that invokes the compiler to compile the given method and jumps to the newly compiled code, ensuring the arguments it received are passed correctly to the actual method. + +Before jumping to the new code, though, the magic trampoline takes care of patching the call site so that next time the call will go directly to the method instead of the trampoline. How does this all work? + +`mono_arch_create_jit_trampoline ()` creates a small function that just preserves the arguments passed to it and adds an additional argument (the method to compile) before calling the generic trampoline. This small function is called the specific trampoline, because it is method-specific (the method to compile is hard-code in the instruction stream). + +The generic trampoline saves all the arguments that could get clobbered and calls a C function that will do two things: + +- actually call the JIT to compile the method +- identify the calling code so that it can be patched to call directly the actual method + +If the 'this' argument to a method is a boxed valuetype that is passed to a method that expects just a pointer to the data, an additional unboxing trampoline will need to be inserted as well. + +## Exception handling + +Exception handling is likely the most difficult part of the port, as it needs to deal with unwinding (both managed and unmanaged code) and calling catch and filter blocks. It also needs to deal with signals, because mono takes advantage of the MMU in the CPU and of the operation system to handle dereferences of the NULL pointer. Some of the function needed to implement the mechanisms are: + +`mono_arch_get_throw_exception ()` returns a function that takes an exception object and invokes an arch-specific function that will enter the exception processing. To do so, all the relevant registers need to be saved and passed on. + +`mono_arch_handle_exception ()` this function takes the exception thrown and a context that describes the state of the CPU at the time the exception was thrown. The function needs to implement the exception handling mechanism, so it makes a search for an handler for the exception and if none is found, it follows the unhandled exception path (that can print a trace and exit or just abort the current thread). The difficulty here is to unwind the stack correctly, by restoring the register state at each call site in the call chain, calling finally, filters and handler blocks while doing so. + +As part of exception handling a couple of internal calls need to be implemented as well. + +`ves_icall_get_frame_info ()` returns info about a specific frame. + +`mono_jit_walk_stack ()` walks the stack and calls a callback with info for each frame found. + +`ves_icall_get_trace ()` return an array of StackFrame objects. + +### Code generation for filter/finally handlers + +Filter and finally handlers are called from 2 different locations: + +- from within the method containing the exception clauses +- from the stack unwinding code + +To make this possible we implement them like subroutines, ending with a "return" statement. The subroutine does not save the base pointer, because we need access to the local variables of the enclosing method. Its is possible that instructions inside those handlers modify the stack pointer, thus we save the stack pointer at the start of the handler, and restore it at the end. We have to use a "call" instruction to execute such finally handlers. + +The MIR code for filter and finally handlers looks like: + + OP_START_HANDLER + ... + OP_END_FINALLY | OP_ENDFILTER(reg) + +OP_START_HANDLER: should save the stack pointer somewhere OP_END_FINALLY: restores the stack pointers and returns. OP_ENDFILTER (reg): restores the stack pointers and returns the value in "reg". + +### Calling finally/filter handlers + +There is a special opcode to call those handler, its called OP_CALL_HANDLER. It simple emits a call instruction. + +Its a bit more complex to call handler from outside (in the stack unwinding code), because we have to restore the whole context of the method first. After that we simply emit a call instruction to invoke the handler. Its usually possible to use the same code to call filter and finally handlers (see arch_get_call_filter). + +### Calling catch handlers + +Catch handlers are always called from the stack unwinding code. Unlike finally clauses or filters, catch handler never return. Instead we simply restore the whole context, and restart execution at the catch handler. + +### Passing Exception objects to catch handlers and filters + +We use a local variable to store exception objects. The stack unwinding code must store the exception object into this variable before calling catch handler or filter. + +## Minor helper methods + +A few minor helper methods are referenced from the arch-independent code. Some of them are: + +`mono_arch_cpu_optimizations ()` This function returns a mask of optimizations that should be enabled for the current CPU and a mask of optimizations that should be excluded, instead. + +`mono_arch_regname ()` Returns the name for a numeric register. + +`mono_arch_get_allocatable_int_vars ()` Returns a list of variables that can be allocated to the integer registers in the current architecture. + +`mono_arch_get_global_int_regs ()` Returns a list of caller-save registers that can be used to allocate variables in the current method. + +`mono_arch_instrument_mem_needs ()` + +`mono_arch_instrument_prolog ()` + +`mono_arch_instrument_epilog ()` Functions needed to implement the profiling interface. + +## Testing the port + +The JIT has a set of regression tests in \*.cs files inside the mini directory. + +The usual method of testing a port is by compiling these tests on another machine with a working runtime by typing 'make rcheck', then copying TestDriver.dll and \*.exe to the mini directory. The tests can be run by typing: + +``` bash + ./mono --regression +``` + +The suggested order for working through these tests is the following: + +- basic.exe +- basic-long.exe +- basic-float.exe +- basic-calls.exe +- objects.exe +- arrays.exe +- exceptions.exe +- iltests.exe +- generics.exe + +## Writing regression tests + +Regression tests for the JIT should be written for any bug found in the JIT in one of the \*.cs files in the mini directory. Eventually all the operations of the JIT should be tested (including the ones that get selected only when some specific optimization is enabled). + +## Platform specific optimizations + +An example of a platform-specific optimization is the peephole optimization: we look at a small window of code at a time and we replace one or more instructions with others that perform better for the given architecture or CPU. + +## Function descriptors + +Some ABIs, like those for IA64 and PPC64, don't use direct function pointers, but so called function descriptors. A function descriptor is a short data structure which contains at least a pointer to the code of the function and a pointer to a GOT/TOC, which needs to be loaded into a specific register prior to jumping to the function. Global variables and large constants are accessed through that register. + +Mono does not need function descriptors for the JITted code, but we need to handle them when calling unmanaged code and we need to create them when passing managed code to unmanaged code. + +`mono_create_ftnptr()` creates a function descriptor for a piece of generated code within a specific domain. + +`mono_get_addr_from_ftnptr()` returns the pointer to the native code in a function descriptor. Never use this function to generate a jump to a function without loading the GOT/TOC register unless the function descriptor was created by `mono_create_ftnptr()`. + +See the sources for IA64 and PPC64 on when to create and when to dereference function descriptors. On PPC64 function descriptors for various generated helper functions (in exceptions-ppc.c and tramp-ppc.c) are generated in front of the code they refer to (see `ppc_create_pre_code_ftnptr()`). On IA64 they are created separately. + +## Emulated opcodes + +Mini has code for emulating quite a few opcodes, most notably operations on longs, int/float conversions and atomic operations. If an architecture wishes such an opcode to be emulated, mini produces icalls instead of those opcodes. This should only be considered when the operation cannot be implemented efficiently and thus the overhead occured by the icall is not relatively large. Emulation of operations is controlled by #defines in the arch header, but the naming is not consistent. They usually start with `MONO_ARCH_EMULATE_`, `MONO_ARCH_NO_EMULATE_` and `MONO_ARCH_HAVE_`. + +## Prolog/Epilog + +The method prolog is emitted by the mono_arch_emit_prolog () function. It usually consists of the following parts: + +- Allocate frame: set fp to sp, decrement sp. +- Save callee saved registers to the frame +- Initialize the LMF structure +- Link the LMF structure: This implements the following pseudo code: + + + + lmf->lmf_addr = mono_get_lmf_addr () + lmf->previous_lmf = *(lmf->lmf_addr) + *(lmf->lmf_addr)->lmf + +- Compute bb->max_offset for each basic block: This enables mono_jit_output_basic_block () to emit short branches where possible. +- Store the runtime generic context, see the Generic Sharing section. +- Store the signature cookie used by vararg methods. +- Transfer arguments to the location they are allocated to, i.e. load arguments received on the stack to registers if needed, and store arguments received in registers to the stack/callee saved registers if needed. +- Initialize the various variables used by the soft debugger code. +- Implement tracing support. + +The epilog is emitted by the mono_arch_emit_epilog () function. It usually consists of the following parts: + +- Restore the LMF by doing: + + + + *(lmf->lmf_addr) = lmf->previous_lmf. + +- Load returned valuetypes into registers if needed. +- Implement tracing support. +- Restore callee saved registers. +- Pop frame. +- Return to the caller. + +Care must be taken during these steps to avoid clobbering the registers holding the return value of the method. + +Callee saved registers are either saved to dedicated stack slots, or they are saved into the LMF. The stack slots where various things are saved are allocated by mono_arch_allocate_vars (). + +## Delegate Invocation + +A delegate is invoked like this by JITted code: + +delegate->invoke_impl (delegate, arg1, arg2, arg3, ...) + +Here, 'invoke_impl' originally points to a trampoline which ends up calling the 'mono_delegate_trampoline' C function. This function tries to find an architecture specific optimized implementation by calling 'mono_arch_get_delegate_invoke_impl'. + +mono_arch_get_delegate_invoke_impl () should return a small trampoline for invoking the delegate which matches the following pseudo code: + +-for instance delegates: + +delegate->method_ptr (delegate->target, arg1, arg2, arg3, ...) + +- for static delegates: + +delegate->method_ptr (arg1, arg2, arg3, ...) + +## Varargs + +The vararg calling convention is implemented as follows: + +### Caller side + +- The caller passes in a 'signature cookie', which is a hidden argument containing a MonoSignature\*. + + + + This argument is passed just before the implicit arguments, i.e. if the callee signature is this: + foo (string format, ...) + +and the callee signature is this: + + foo ("%d %d", 1, 2) + +then the real callee signature would look like: + + foo ("%d %d", , 1, 2) + +To simplify things, both the sig cookie and the implicit arguments are always passed on the stack and not in registers. mono_arch_emit_call () is responsible for emitting this argument. + +### Callee side + +- mono_arch_allocate_vars () is responsible for allocating a local variable slot where the sig cookie will be saved. cfg->sig_cookie should contain the stack offset of the local variable slot. +- mono_arch_emit_prolog () is responsible for saving the sig cookie argument into the local variable. +- The implementation of OP_ARGLIST should load the sig cookie from the local variable, and save it into its dreg, which will point to a local variable of type RuntimeArgumentHandle. +- The fetching of vararg arguments is implemented by icalls in icalls.c. + +tests/vararg.exe contains test cases to exercise this functionality. + +## Unwind info + +On most platforms, the JIT uses DWARF unwind info to unwind the stack during exception handling. The API and some documentation is in the mini-unwind.h file. The mono_arch_emit_prolog () function is required to emit this information using the macros in mini-unwind.h, and the mono_arch_find_jit_info () function needs to pass it to mono_unwind_frame (). In addition to this, the various trampolines might also have unwind info, which makes stack walks possible when using the gdb integration (XDEBUG). + +The task of a stack unwinder is to construct the machine state at the caller of the current stack frame, i.e: - find the return address of the caller - find the values of the various callee saved registers in the caller at the point of the call + +The DWARF unwinder is based on the concept of a CFA, or Canonical Frame Address. This is an address of the stack frame which does not change during the execution of the method. By convention, the CFA is equal to the value of the stack pointer prior to the instruction which transferred execution to the current method. So for example, on x86, the value of the CFA on enter to the method is esp+4 because of the pushing of the return address. There are two kinds of unwind directives: + +- those that specify how to compute the CFA at any point in the method using a \+\ +- those that specify where a given register is saved in relation to the CFA. + +For a typical x86 method prolog, the unwind info might look like this: + +``` bash +- +- +push ebp +- +mov ebp, esp +- +``` + +## Generic Sharing + +Generic code sharing is optional. See the document on [generic-sharing](/docs/advanced/runtime/docs/generic-sharing/) for information on how to support it on an architecture. + +### MONO_ARCH_RGCTX_REG + +The MONO_ARCH_RGCTX_REG define should be set to a hardware register which will be used to pass the 'mrgctx' hidden argument to generic shared methods. It should be a caller saved register which is not used in local register allocation. Also, any code which gets executed between the caller and the callee (i.e. trampolines) needs to avoid clobbering this registers. The easiest solution is to set it to the be the same as MONO_ARCH_IMT_REG, since IMT/generic sharing are never used together during a call. The method prolog must save this register to cfg->rgctx_var. + +### Static RGCTX trampolines + +These trampolines are created by mono_arch_get_static_rgctx_trampoline (). They are used to call generic shared methods indirectly from code which cannot pass an MRGCTX. They should implement the following pseudo code: + + = mrgctx + jump + +### Generic Class Init Trampoline + +This one of a kind trampoline is created by mono_arch_create_generic_class_init_trampoline (). They are used to run the .cctor of the vtable passed in as an argument in MONO_ARCH_VTABLE_REG. They should implement the following pseudo code: + + vtable = + if (!vtable->initialized) + + +The generic trampoline code needs to be modified to pass the argument received in MONO_ARCH_VTABLE_REG to the C trampoline function, which is mono_generic_class_init_trampoline (). + +### RGCTX Lazy Fetch Trampoline + +These trampolines are created by mono_arch_create_rgctx_lazy_fetch_trampoline (). They are used for fetching values out of an MonoRuntimeGenericContext, lazily initializing them as needed. diff --git a/docs/design/mono/web/mono-error.md b/docs/design/mono/web/mono-error.md new file mode 100644 index 0000000000000..dd4c9f0575463 --- /dev/null +++ b/docs/design/mono/web/mono-error.md @@ -0,0 +1,144 @@ +# Error handling and MonoError + +## MonoError + +MonoError is the latest attempt at cleaning up and sanitizing error handling in the runtime. This document highlights some of the design goals and decisions, the implementation and the migration strategy. + +### Design goals + +- Replace the majority of the adhoc error handling subsystems present today in the runtime. Each one is broken in a subtle way, has slightly different semantics and error conversion between them is spot, at best. + +- Map well to the final destination of all runtime errors: managed exceptions. This includes being compatible with .net when it comes to the kind of exception produced by a given error condition. + +- Be explicit, lack any magic. The loader-error setup does control flow happens in the background through a TLS variable, which made it very brittle and error prone. + +- Explicit and multiple error scopes. Make it possible to have multiple error scopes and make them explicit. We need to support nested scopes during type loading, even if reporting is flat. + +- Be as simple as possible. Error handling is the hardest part of the runtime to test so it must be simple. Which means complex error reporting, such as chaining, is out of question. + +## Current implementation + +The current implementation exists in mono-error.h and mono-error-internals.h. The split is so API users can consume errors, but they are not supported to be able to produce them - such use case has yet to arise. + +#### Writing a function that produces errors + +``` c +/** + * + * @returns NULL on error + */ +void* +my_function (int a, MonoError *error) +{ + if (a <= 0) {// + mono_error_set_argument (error, "a", "argument a must be bigger than zero, it was %d", a); + return NULL; + } + return malloc (a); +} +``` + +Important points from the above: + +- Add a "MonoError \*error" argument as the last to your function +- Call one of the mono_error_set functions based on what managed exception this should produce and the available information +- Document that a NULL returns means an error + +## Writing a function that consumes errors + +``` c +void +other_function (void) +{ + ERROR_DECL (error); + void *res; + + res = my_function (10, error); + //handling the error: + //1st option: set the pending exception. Only safe to do in icalls + if (mono_error_set_pending_exception (error)) //returns TRUE if an exception was set + return; + + //2nd option: legacy code that can't handle failures: + mono_error_assert_ok (error); + + //3rd option (deprecated): raise an exception and write a FIXME note + // (implicit cleanup, no-op if there was no error) + mono_error_raise_exception (error); /* FIXME don't raise here */ + + //4th option: ignore + mono_error_cleanup (error); +} +``` + +Important points from the above: + +- Use `ERROR_DECL (error)` to declare and initialize a `MonoError *error` variable. (Under the hood, it declares a local `MonoError error_value` using `ERROR_DECL_VALUE (error_value)`. You may use `ERROR_DECL_VALUE (e)` to declare a variable local variable yourself. It's pretty unusual to need to do that, however.) +- Pass it to the required function and always do something with the result +- Given we're still transitioning, not all code can handle in the same ways + +## Handling the transition + +The transition work is not complete and we're doing it piece-by-piece to ensure we don't introduce massive regressions in the runtime. The idea is to move the least amount of code a time to use the new error machinery. + +Here are the rules for code conversion: + +- Mono API functions that need to call functions which take a MonoError should assert on failure or cleanup the error as there's no adequate alternative at this point. They **must not** use `mono_error_raise_exception` or `mono_error_set_pending_exception` + +- When possible, change the function signature. If not, add a \_checked variant and add the `MONO_RT_EXTERNAL_ONLY` to the non-checked version if it's in the Mono API. That symbol will prevent the rest of the Mono runtime from calling the non-checked version. + +## Advanced technique: using a local error to raise a different exception + +Suppose you want to call a function `foo_checked()` but you want to raise a different exception if it fails. In this case, it makes sense to create a local error variable to handle the call to `foo_checked`: + +``` c +int +my_function (MonoObject *arg, MonoError *error) +{ + ERROR_DECL (local_error); + int result = foo_checked (arg, local_error); + if (!is_ok (local_error)) { + mono_error_set_execution_engine (error, "Could not successfully call foo_checked, due to: %s", mono_error_get_message (local_error)); + mono_error_cleanup (local_error); + } + return result; +``` + +- Pass `local_error` to `foo_checked` +- Check the result and if it wasn't okay, set a different error code on `error` It is common to use `mono_error_get_message` to include the message from the local failure as part of the new exception +- Cleanup `local_error` to release its resources + +## Advanced technique: MonoErrorBoxed and mono_class_set_failure + +Normally we store a `MonoError` on the stack. The usual scenario is that managed code calls into the runtime, we perform some operations, and then we either return a result or convert a `MonoError` into a pending exception. So a stack lifetime for a `MonoError` makes sense. + +There is one scenario where we need a heap-allocated `MonoError` whose lifetime is tied to a `MonoImage`: the initialization of a managed class. `MonoErrorBoxed` is a thin wrapper around a `MonoError` that identifies a `MonoError` that is allocated in the mempool of a `MonoImage`. It is created using `mono_error_box()` and converted back to an ordinary `MonoError` using `mono_error_unbox()`. + +``` c +static int +some_class_init_helper (MonoClass *k) +{ + if (mono_class_has_failure (k)) + return -1; /* Already a failure, don't bother trying to init it */ + ERROR_DECL (local_error); + int result = foo_checked (k, local_error); + if (!is_ok (error)) { + mono_class_set_failure (k, mono_error_box (local_error, k->image)); + mono_error_cleanup (local_error); + } + return result; +} +``` + +- Check whether the class is already marked as a failure +- Pass a `local_error` to `foo_checked` +- Check the result and if it wasn't okay, allocate a boxed `MonoError` in the mempool of the class's image +- Mark the class that failed with the boxed error +- Cleanup the `local_error` to release its resources + +### Design issues + +- Memory management of the error setting functions is not consistent or clear +- Use a static initializer in the declaration site instead of mono_error_init? +- Force an error to always be set or only when there's an exception situation? I.E. mono_class_from_name failing to find the class X finding the class but it failed to load. +- g_assert (mono_errork_ok (&error)) could be replaced by a macro that uses g_error so we can see the error contents on crashes. diff --git a/docs/design/mono/web/other.md b/docs/design/mono/web/other.md new file mode 100644 index 0000000000000..f3bfe69601b81 --- /dev/null +++ b/docs/design/mono/web/other.md @@ -0,0 +1,105 @@ +# Other notes + +## Faster runtime builds + +To speed up runtime builds, use one or more of the following: + +- Turn off optimization by passing CFLAGS=-O0 to configure. +- Turn off generation of libmono by passing --disable-libraries to configure. +- Turn off boeh support by passing --disable-boehm to configure. +- Build in parallel, i.e. using make -j4. +- Use ccache by passing CC="ccache gcc" CXX="ccache g++" to configure. + +## Runtime debugging methods + +### Debugging crashes which don't happen inside gdb, or only happen when a test program is ran in a loop + +Set the MONO_DEBUG env variable to 'suspend-on-sigsegv'. This causes the runtime native SIGSEGV handler to spin in a loop, so gdb can be attached to the running process. + +### Setting native breakpoints in managed methods + +Use the --break \ command line argument. The JIT will generate a native breakpoint (INT on x86) into the prolog of the given method. Use --break-at-bb \ \ to set a breakpoint at the start of a given basic block. + +### Displaying JIT debug output + +Use the -v -v -v -v command line argument. Set the MONO_VERBOSE_METHOD env variable to display output for only one method. + +### Dumping JIT IR to IGV + +Set `MONO_JIT_DUMP_METHOD` to specify a method to dump over network to a running instance of the [IdealGraphVisualizer (IGV)](http://ssw.jku.at/General/Staff/TW/igv.html). An IGV build that is compatible with the implementation in Mono is available for [Mac/Linux/Windows](https://github.com/lewurm/GraalJVMCI8/releases/tag/v0.1) and requires at least JRE 1.7 to run. + +On Mac: + +``` bash +$ # unpack zip file +$ open idealgraphvisualizer +. +``` + +For Linux there's `bin/idealgraphvisualizer` and for Windows there's `bin/idealgraphvisualizer.exe`. After starting IGV, it will listen on port 4445 and is ready to receive graphs. + +Here an example for dumping the IR of a method: + +``` bash +$ cat fib.cs +using System; + +public class Fib { + + public static int fib (int n) { + if (n < 2) + return 1; + return fib(n-2)+fib(n-1); + } + public static int Main (string[] args) { + int repeat = 1; + + if (args.Length == 1) + repeat = Convert.ToInt32 (args [0]); + + // Console.WriteLine ("Repeat = " + repeat); + + if (repeat > 32) { + Console.WriteLine ("{0}", fib (repeat)); + return 0; + } + + for (int i = 0; i < repeat; i++) + if (fib (32) != 3524578) + return 1; + + return 0; + } +} +$ csc fib.cs +$ MONO_JIT_DUMP_METHOD=Fib::fib mono fib.exe +cfg_dump: create context for "Fib::fib" +``` + +now switch to IGV, you should see something like that: [![igv-screenshot.png](images/igv-screenshot.png)](images/igv-screenshot.png) + +you can explore the different compiler passes in the navigation bar on the left side. IGV also has a graph diff feature: + +[![igv-diff.png](/images/igv-diff.png)](/images/igv-diff.png) + +### Displaying runtime debug output + +Set the MONO_LOG_LEVEL env variable to 'debug'. The log output is useful for diagnosing assembly loading/AOT/pinvoke problems. + +### mono_debug_count () + +This is useful for debugging problems where a piece of code is executed many times, and we need to find out which run causes the runtime to misbehave, i.e. which method is miscompiled by the JIT etc. It works by changing + +``` bash +do_something () +``` + +To: + +``` bash +if (mono_debug_count ()) { + +} +``` + +mono_debug_count () is controlled by the COUNT env variable, the first COUNT times it is called, it will return TRUE, after that, it will return FALSE. This allows us to find out exactly which execution of \ causes the problem by running the application while varying the value of COUNT using a binary search. diff --git a/docs/design/mono/web/register-allocation.md b/docs/design/mono/web/register-allocation.md new file mode 100644 index 0000000000000..e6247d8eb9587 --- /dev/null +++ b/docs/design/mono/web/register-allocation.md @@ -0,0 +1,153 @@ +# Register allocation in the Mono JIT + +### Global Register Allocation + +\ + +### Local Register Allocation + +This section describes the cross-platform local register allocator which is in the file mini-codegen.c. + +The input to the allocator is a basic block which contains linear IL, ie. instructions of the form: + + DEST <- SRC1 OP SRC2 + +where DEST, SRC1, and SRC2 are virtual registers (vregs). The job of the allocator is to assign hard or physical registers (hregs) to each virtual registers so the vreg references in the instructions can be replaced with their assigned hreg, allowing machine code to be generated later. + +The allocator needs information about the number and types of arguments of instructions. It takes this information from the machine description files. It also needs arch specific information, like the number and type of the hard registers. It gets this information from arch-specific macros. + +Currently, the vregs and hregs are partitioned into two classes: integer and floating point. + +The allocator consists of two phases: In the first phase, a forward pass is made over the instructions, collecting liveness information for vregs. In the second phase, a backward pass is made over the instructions, assigning registers. This backward mode of operation makes the allocator somewhat difficult to understand, but leads to better code in most cases. + +#### Allocator state + +The state of the allocator is stored in two arrays: iassign and isymbolic. iassign maps vregs to hregs, while isymbolic is the opposite. For a vreg, iassign [vreg] can contain the following values: + + -1 vreg has no assigned hreg + + hreg index (>= 0) vreg is assigned to the given hreg. This means later instructions (which we have already processed due to the backward direction) expect the value of vreg to be found in hreg. + + spill slot index (< -1) vreg is spilled to the given spill slot. This means later instructions expect the value of vreg to be found on the stack in the given spill slot. When this vreg is used as a dreg of an instruction, a spill store needs to be generated after the instruction saving its value to the given spill slot. + +Also, the allocator keeps track of which hregs are free and which are used. This information is stored in a bitmask called ifree_mask. + +There is a similar set of data structures for floating point registers. + +#### Spilling + +When an allocator needs a free hreg, but all of them are assigned, it needs to free up one of them. It does this by spilling the contents of the vreg which is currently assigned to the selected hreg. Since later instructions expect the vreg to be found in the selected hreg, the allocator emits a spill-load instruction to load the value from the spill slot into the hreg after the currently processed instruction. When the vreg which is spilled is a destination in an instruction, the allocator will emit a spill-store to store the value into the spill slot. + +#### Fixed registers + +Some architectures, notably x86/amd64 require that the arguments/results of some instructions be assigned to specific hregs. An example is the shift opcodes on x86, where the second argument must be in ECX. The allocator has support for this. It tries to allocate the vreg to the required hreg. If thats not possible, then it will emit compensation code which moves values to the correct registers before/after the instruction. + +Fixed registers are mainly used on x86, but they are useful on more regular architectures on well, for example to model that after a call instruction, the return of the call is in a specific register. + +A special case of fixed registers is two address architectures, like the x86, where the instructions place their results into their first argument. This is modelled in the allocator by allocating SRC1 and DEST to the same hreg. + +#### Global registers + +Some variables might already be allocated to hardware registers during the global allocation phase. In this case, SRC1, SRC2 and DEST might already be a hardware register. The allocator needs to do nothing in this case, except when the architecture uses fixed registers, in which case it needs to emit compensation code. + +#### Register pairs + +64 bit arithmetic on 32 bit machines requires instructions whose arguments are not registers, but register pairs. The allocator has support for this, both for freely allocatable register pairs, and for register pairs which are constrained to specific hregs (EDX:EAX on x86). + +#### Floating point stack + +The x86 architecture uses a floating point register stack instead of a set of fp registers. The allocator supports this by a post-processing pass which keeps track of the height of the fp stack, and spills/loads values from the stack as neccesary. + +#### Calls + +Calls need special handling for two reasons: first, they will clobber all caller-save registers, meaning their contents will need to be spilled. Also, some architectures pass arguments in registers. The registers used for passing arguments are usually the same as the ones used for local allocation, so the allocator needs to handle them specially. This is done as follows: the MonoInst for the call instruction contains a map mapping vregs which contain the argument values to hregs where the argument needs to be placed,like this (on amd64): + + R33 -> RDI + R34 -> RSI + ... + +When the allocator processes the call instruction, it allocates the vregs in the map to their associated hregs. So the call instruction is processed as if having a variable number of arguments which fixed register assignments. + +An example: + + R33 <- 1 + R34 <- 2 + call + +When the call instruction is processed, R33 is assigned to RDI, and R34 is assigned to RSI. Later, when the two assignment instructions are processed, R33 and R34 are already assigned to a hreg, so they are replaced with the associated hreg leading to the following final code: + + RDI <- 1 + RSI <- 1 + call + +#### Machine description files + +A typical entry in the machine description files looks like this: + +shl: dest:i src1:i src2:s clob:1 len:2 + +The allocator is only interested in the dest,src1,src2 and clob fields. It understands the following values for the dest, src1, src2 fields: + +- i - integer register +- f - fp register +- b - base register (same as i, but the instruction does not modify the reg) +- m - fp register, even if an fp stack is used (no fp stack tracking) + +It understands the following values for the clob field: + +- 1 - sreg1 needs to be the same as dreg +- c - instruction clobbers the caller-save registers + +Beside these values, an architecture can define additional values (like the 's' in the example). The allocator depends on a set of arch-specific macros to convert these values to information it needs during allocation. + +#### Arch specific macros + +These macros usually receive a value from the machine description file (like the 's' in the example). The examples below are for x86. + + /* + * A bitmask selecting the caller-save registers (these are used for local + * allocation). + */ + #define MONO_ARCH_CALLEE_REGS X86_CALLEE_REGS + + /* + * A bitmask selecting the callee-saved registers (these are usually used for + * global allocation). + */ + #define MONO_ARCH_CALLEE_SAVED_REGS X86_CALLER_REGS + + /* Same for the floating point registers */ + #define MONO_ARCH_CALLEE_FREGS 0 + #define MONO_ARCH_CALLEE_SAVED_FREGS 0 + + /* Whenever the target uses a floating point stack */ + #define MONO_ARCH_USE_FPSTACK TRUE + + /* The size of the floating point stack */ + #define MONO_ARCH_FPSTACK_SIZE 6 + + /* + * Given a descriptor value from the machine description file, return the fixed + * hard reg corresponding to that value. + */ + #define MONO_ARCH_INST_FIXED_REG(desc) ((desc == 's') ? X86_ECX : ((desc == 'a') ? X86_EAX : ((desc == 'd') ? X86_EDX : ((desc == 'y') ? X86_EAX : ((desc == 'l') ? X86_EAX : -1))))) + + /* + * A bitmask selecting the hregs which can be used for allocating sreg2 for + * a given instruction. + */ + #define MONO_ARCH_INST_SREG2_MASK(ins) (((ins [MONO_INST_CLOB] == 'a') || (ins [MONO_INST_CLOB] == 'd')) ? (1 << X86_EDX) : 0) + + /* + * Given a descriptor value, return whenever it denotes a register pair. + */ + #define MONO_ARCH_INST_IS_REGPAIR(desc) (desc == 'l' || desc == 'L') + + /* + * Given a descriptor value, and the first register of a regpair, return a + * bitmask selecting the hregs which can be used for allocating the second + * register of the regpair. + */ + #define MONO_ARCH_INST_REGPAIR_REG2(desc,hreg1) (desc == 'l' ? X86_EDX : -1) + +[Original version of this document in git.](https://github.com/mono/mono/blob/4b2982c3096e3b17156bf00a062777ed364e3674/docs/jit-regalloc) diff --git a/docs/design/mono/web/soft-debugger-wire-format.md b/docs/design/mono/web/soft-debugger-wire-format.md new file mode 100644 index 0000000000000..49facbc283df7 --- /dev/null +++ b/docs/design/mono/web/soft-debugger-wire-format.md @@ -0,0 +1,469 @@ +# Soft Debugger Wire Format + +## Introduction + +The [Mono Soft Debugger](/docs/advanced/runtime/docs/soft-debugger/) (SDB) is a debugger implemented by the Mono runtime. The Mono runtime exposes an interface that debugger clients can use to debug a Mono application. Mono provides a convenience library in the form of the Mono.Debugger.Soft.dll that can be used to communicate with a running Mono process. + +The Mono.Debugger.Soft.dll library uses a protocol over sockets to debug applications. The wire protocol is inspired by the [JDWP (Java Debug Wire Protocol)](http://download.oracle.com/javase/1,5.0/docs/guide/jpda/jdwp-spec.html). Familiarity with that specification is a good read. + +This document describes the wire protocol used between debugging clients and the Mono runtime. + +Where possible, the corresponding protocol detail is linked to a function name and file location in Mono source code. These informations are based on Mono master version at revision *f42ba4a168e7cb9b9486b8a96c53752e4467be8a*. + +## Protocol details + +### Transport + +Mono SDB protocol, just like its Java counterpart, was designed with no specific transport in mind. However, presently the public Mono SDB only has a TCP/IP transport available (under the transport name of `dt_socket`). Other transports can be plugged by modifying this interface. + +#### Bootstraping a connection + +To boostrap a connection, the client send handshake to the server (see `debugger-agent.c:1034`) in the form of the 13 ASCII characters string "DWP-Handshake" and wait for the server reply which consist of the exact same ASCII character sequence. + +### Packets + +Just like JDWP, Mono SDB protocol is packet-based with two types of packet: command and reply. All fields in a packet is sent in big-endian format which is transparently handled in Mono source code with corresponding helper encode/decode functions. + +Command packet are used by either side (client or server) to request information, act on the execution of the debugged program or to inform of some event. Replies is only sent in response to a command with information on the success/failure of the operation and any extra data depending on the command that triggered it. + +Both type of packet contains a header. The header is always 11 bytes long. Their descriptions are given afterwards: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Command packet header
byte 1byte 2byte 3byte 4byte 5byte 6byte 7byte 8byte 9byte 10byte 11
lengthidflagscommand setcommand
+ +In Mono SDB source code, the command header is decoded in the server thread `debugger_thread` function at `debugger-agent.c:7583`. + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Reply packet header
byte 1byte 2byte 3byte 4byte 5byte 6byte 7byte 8byte 9byte 10byte 11
lengthidflagserror code
+In Mono SDB source code, a reply packet is constructed and sent by the `send_reply_packet` function in `debugger-agent:1514`. + +#### Packet field details + +##### Common fields + +length : The total length in byte of the packet including header i.e. this value will be 11 if the packet only consists of header with no other data + +id : Uniquely identify sent packet command/reply pair so that they can be asynchronously matched. This is in practise a simple monotonic integer counter. Note that client and server may use the same id value when sending their packets as the uniqueness property is only with respect to a specific source. + +flags : At the moment this value is only used with a reply packet in which case its value is set to `0x80`. A command packet should have this value set to 0. + +##### Command specific fields + +command set : This value allows grouping commands into similar blocks for quicker processing. The different command sets with their values are given below: + +| Command set | Value | +|:-----------------|:------| +| Virtual Machine | 1 | +| Object reference | 9 | +| String reference | 10 | +| Threads | 11 | +| Array reference | 13 | +| Event request | 15 | +| Stack frame | 16 | +| AppDomain | 20 | +| Assembly | 21 | +| Method | 22 | +| Type | 23 | +| Module | 24 | +| Events | 64 | + +command : Tell what command this packet corresponds to. This value is relative to the previously defined command set so the values are reused across different command sets. Definition of each command is given in a later chapter. + +##### Reply specific fields + +error code : Define which error occured or if the command was successful. Error code definition is given below: + +| Error name | Value | Mono specific notes | +|:--------------------------|:------|:---------------------------------------------------------------------------| +| Success | 0 | | +| Invalid object | 20 | | +| Invalid field ID | 25 | | +| Invalid frame ID | 30 | | +| Not Implemented | 100 | | +| Not Suspended | 101 | | +| Invalid argument | 102 | | +| Unloaded | 103 | AppDomain has been unloaded | +| No Invocation | 104 | Returned when trying to abort a thread which isn't in a runtime invocation | +| Absent information | 105 | Returned when a requested method debug information isn't available | +| No seq point at IL Offset | 106 | Returned when a breakpoint couldn't be set | + +#### Data type marshalling + +| Name | Size | Description | +|:--------|:-----------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| byte | 1 byte | A byte value | +| short | 2 byte | A UInt16 value | +| int | 4 bytes | A UInt32 value | +| long | 8 bytes | A UInt64 value | +| id | 4 bytes | The same size is used for all IDs (ObjectID, PointerID, TypeId, MethodID, AssemblyID, ModuleID, FieldID, PropertyID, DomainID) | +| string | At least 4 bytes | A string consists of a leading int value giving the string size followed by *size* bytes of character data. Thus an empty string is simply a 4 bytes integer value of 0 | +| variant | At least 1 byte | A variant type is a special value which consists of a leading byte giving away the MonoType information of the variant followed directly by its raw value. | +| boolean | 4 bytes (an int) | Tough not strictly a type, a boolean is represented by an int value whose value is 1 for true and 0 for false. | + +Most of the encoding function for these types are defined as `buffer_add_*` functions starting from `debugger-agent.c:1429`. Their counterpart are of the form `decode_*` starting from `debugger-agent.c:1349`. + +A lot command returns or accepts fixed-length list of value. In these case, such a list is always prefixed with an int value giving its length followed by *length* element of the same type (which needs to be inferred from the context). When such a list is used the term "list" will be used. For clarification, an empty list is thus a single int value equals to 0. + +#### Various enumeration value definition + +For the record, the following C enumerations define the values used for flags, kind, ... parameters in some commands. + +``` c +typedef enum { + EVENT_KIND_VM_START = 0, + EVENT_KIND_VM_DEATH = 1, + EVENT_KIND_THREAD_START = 2, + EVENT_KIND_THREAD_DEATH = 3, + EVENT_KIND_APPDOMAIN_CREATE = 4, + EVENT_KIND_APPDOMAIN_UNLOAD = 5, + EVENT_KIND_METHOD_ENTRY = 6, + EVENT_KIND_METHOD_EXIT = 7, + EVENT_KIND_ASSEMBLY_LOAD = 8, + EVENT_KIND_ASSEMBLY_UNLOAD = 9, + EVENT_KIND_BREAKPOINT = 10, + EVENT_KIND_STEP = 11, + EVENT_KIND_TYPE_LOAD = 12, + EVENT_KIND_EXCEPTION = 13, + EVENT_KIND_KEEPALIVE = 14, + EVENT_KIND_USER_BREAK = 15, + EVENT_KIND_USER_LOG = 16 +} EventKind; +  +typedef enum { + SUSPEND_POLICY_NONE = 0, + SUSPEND_POLICY_EVENT_THREAD = 1, + SUSPEND_POLICY_ALL = 2 +} SuspendPolicy; +  +typedef enum { + MOD_KIND_COUNT = 1, + MOD_KIND_THREAD_ONLY = 3, + MOD_KIND_LOCATION_ONLY = 7, + MOD_KIND_EXCEPTION_ONLY = 8, + MOD_KIND_STEP = 10, + MOD_KIND_ASSEMBLY_ONLY = 11, + MOD_KIND_SOURCE_FILE_ONLY = 12, + MOD_KIND_TYPE_NAME_ONLY = 13, + MOD_KIND_NONE = 14 +} ModifierKind; +  +typedef enum { + STEP_DEPTH_INTO = 0, + STEP_DEPTH_OVER = 1, + STEP_DEPTH_OUT = 2 +} StepDepth; +  +typedef enum { + STEP_SIZE_MIN = 0, + STEP_SIZE_LINE = 1 +} StepSize; +  +typedef enum { + TOKEN_TYPE_STRING = 0, + TOKEN_TYPE_TYPE = 1, + TOKEN_TYPE_FIELD = 2, + TOKEN_TYPE_METHOD = 3, + TOKEN_TYPE_UNKNOWN = 4 +} DebuggerTokenType; +  +typedef enum { + VALUE_TYPE_ID_NULL = 0xf0, + VALUE_TYPE_ID_TYPE = 0xf1, + VALUE_TYPE_ID_PARENT_VTYPE = 0xf2 +} ValueTypeId; +  +typedef enum { + FRAME_FLAG_DEBUGGER_INVOKE = 1, + + // Use to allow the debugger to display managed-to-native transitions in stack frames. + FRAME_FLAG_NATIVE_TRANSITION = 2 +} StackFrameFlags; +  +typedef enum { + INVOKE_FLAG_DISABLE_BREAKPOINTS = 1, + INVOKE_FLAG_SINGLE_THREADED = 2, + + // Allow for returning the changed value types after an invocation + INVOKE_FLAG_RETURN_OUT_THIS = 4, + + // Allows the return of modified value types after invocation + INVOKE_FLAG_RETURN_OUT_ARGS = 8, + + // Performs a virtual method invocation + INVOKE_FLAG_VIRTUAL = 16 +} InvokeFlags; +``` + +### Command list + +Types given in each command comments corresponds to the type described above. When there are additional arguments or multiple values in a command's reply, they are each time described in the order they appear or have to appear in the data part. Not also that there is no kind of separation sequence or added alignement padding between each value. + +In all cases, if you ask for a command that doesn't exist, a reply will be sent with an error code of NOT_IMPLEMENTED. + +#### Virtual machine commands + +| Name | Value | Action and type of reply | Additional parameters | Possible error code returned | +|:--------------------------|:------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------| +| VERSION | 1 | Returns a mono virtual machine version information (string) followed by two int giving respectively the runtime major and minor version | None | None | +| ALL_THREADS | 2 | Returns a list of ObjectID each mapping to a System.Threading.Thread instance. | None | None | +| SUSPEND | 3 | Suspend the VM execution and returns an empty reply | None | None | +| RESUME | 4 | Resume the VM execution and returns an empty reply | None | NOT_SUSPENDED | +| EXIT | 5 | Stop VM and returns an empty reply | Ask for a exit code (int) to be used by the VM when it exits | None | +| DISPOSE | 6 | Clear event requests, resume the VM and disconnect | None | None | +| INVOKE_METHOD | 7 | Returns a boolean telling if the call was successful followed by an exception object (as a variant) if it was not and by the actual returned value (variant) if it was. | Ask for an ObjectID (id) mapping to a System.Threading.Thread instance, a flags value (int) to pass to the invoke request, the MethodID (id) of the method to invoke, a variant value to be used as *this* (VALUE_TYPE_ID_NULL in case of a valuetype) and a list of variant value representing the parameters of the method. | INVALID_OBJECT, NOT_SUSPENDED, INVALID_METHODID, INVALID_ARGUMENT | +| SET_PROTOCOL_VERSION | 8 | Returns an empty reply | Ask for two int giving respectively the major and minor version of the procotol to use. | None | +| ABORT_INVOKE | 9 | Abort the invocation and returns an empty reply | Ask for an ObjectID (id) mapping to a System.Threading.Thread instance and the id (int) of the command packet that set up the invocation to cancel | INVALID_OBJECT, NO_INVOCATION | +| SET_KEEPALIVE | 10 | Set up the new keep alive value and returns an empty reply | Ask for a timeout value (int) | None | +| GET_TYPES_FOR_SOURCE_FILE | 11 | Returns a list of TypeID (id) of class defined inside the supplied file name | Ask for a file name (string) and an ignore case flag (byte) although setting it to something different than 0 isn't currently supported. | None | +| GET_TYPES | 12 | Returns a list of TypeID (id) of type which corresponds to the provided type name | Ask for type name (string) and a ignore case flag (byte) which acts like a boolean value | INVALID_ARGUMENT | +| INVOKE_METHODS | 13 | Batch invocation of methods | Ask for an ObjectID (id) mapping to a System.Threading.Thread instance, a flags value (int) to pass to the invoke request, the number of methods to invoke (int), and for each method the the MethodID (id) for each method to invoke, a variant value to be used as *this* (VALUE_TYPE_ID_NULL in case of a valuetype) and a list of variant value representing the parameters of the method. | INVALID_OBJECT, NOT_SUSPENDED, INVALID_METHODID, INVALID_ARGUMENT | +| VM_START_BUFFERING | 14 | Initiates the buffering of reply packets to improve latency. Must be paired with a VM_STOP_BUFFERING command | None | None | +| VM_STOP_BUFFERING | 15 | Ends the block of buffered commands, must come after a call to VM_START_BUFFERING | None | None | + +The main function handling these commands is `vm_commands` and is situated at `debugger-agent.c:5671` + +#### Events commands + +Events allows the debuggee to act on program execution (stepping) and also to set up things like breakpoints, watchpoints, exception catching, etc. + +| Name | Value | Type of reply | Additional parameters | Possible error code returned | +|:------------------------------|:------|:-----------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------| +| REQUEST_SET | 1 | Returns the request id (int) | Ask for 3 bytes giving the event kind (EventKind enumeration), suspend policy (SuspendPolicy enumeration) and a list of modifiers which content is context dependent and given in the table below | INVALID_METHODID, INVALID_TYPEID, NO_SEQ_POINT_AT_IL_OFFSET, INVALID_OBJECT, INVALID_ASSEMBLYID | +| REQUEST_CLEAR | 2 | Clear the requested event and returns an empty reply | Ask for an event type (byte) and a request id (int) | None | +| REQUEST_CLEAR_ALL_BREAKPOINTS | 3 | Returns an empty reply | None | None | + +The main function handling these commands is `event_commands` and is situated at `debugger-agent.c:5916` + +Each modifier has the first byte describing the modification it's carrying out and corresponding to the values found in the ModifierKind enumeration. The following table list the remaining body depending on the modification value. + +| Mod value | Body | +|:-----------------|:-----------------------------------------------------------------------------------------------------------------------------------------------| +| COUNT | a MethodID (id) | +| LOCATION_ONLY | a MethodID (id) and a location information (long) | +| STEP | A thread id, size of the step (int) corresponding to the StepSize enumeration and depth of it (int) corresponding to the StepDepth enumeration | +| THREAD_ONLY | A thread id | +| EXCEPTION_ONLY | A TypeID representing a exception type and two byte values setting respectively the caught and uncaught filter | +| ASSEMBLY_ONLY | A list of AssemblyID (id) | +| SOURCE_FILE_ONLY | A list of source file name (string) | +| TYPE_NAME_ONLY | A list of type name (string) | +| NONE | | + +#### Thread commands + +Each command requires at least one ObjectID (of type id) parameter mapping to a thread instance before any additional parameter the command may require. + +| Name | Value | Type of reply | Additional parameters | Possible error code returned | +|:---------------|:------|:------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------|:-----------------------------| +| GET_FRAME_INFO | 1 | Returns a list of quadruplet of frame ID (int), MethodID (id), IL offset (int) and frame flags (byte) | Ask for a start frame (currently other value than 0 aren't supported) as an int and a length as a int | INVALID_OBJECT | +| GET_NAME | 2 | Returns the name of the thread as a string | None | INVALID_OBJECT | +| GET_STATE | 3 | Return the thread state as an int | None | INVALID_OBJECT | +| GET_INFO | 4 | Returns a byte value telling if the thread is a threadpool thread (1) or not (0) | None | INVALID_OBJECT | +| GET_ID | 5 | Returns the thread id (address of the object) as a long | None | INVALID_OBJECT | +| GET_TID | 6 | Returns the proper thread id (or TID) as a long | None | INVALID_OBJECT | +| SET_IP | 7 | Set the location where execution will return when this thread is resumed | Thread ID (int), Method ID (long), IL offset (long) | INVALID_ARGUMENT | + +The main function handling these commands is `thread_commands` and is situated at `debugger-agent.c:6991` + +#### AppDomains commands + +| Name | Value | Type of reply | Additional parameters | Possible error code returned | +|:-------------------|:------|:----------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------| +| GET_ROOT_DOMAIN | 1 | Returns the DomainID of the root domain | None | None | +| GET_FRIENDLY_NAME | 2 | Returns the friendly name as a string of the provided DomainID | Ask for a DomainID (id) | INVALID_DOMAINID | +| GET_ASSEMBLIES | 3 | Returns a list of AssemblyID contained inside this AppDomain | Ask for a DomainID (id) | INVALID_DOMAINID | +| GET_ENTRY_ASSEMBLY | 4 | Returns the entry AssemblyID of this domain | Ask for a DomainID (id) | INVALID_DOMAINID | +| CREATE_STRING | 5 | Returns the ObjectID of the created string | Ask for a DomainID (id) where to create the new string and a string typed value to put inside the domain | INVALID_DOMAINID | +| GET_CORLIB | 6 | Returns the AssemblyID of the load corlib inside this AppDomain | Ask for a DomainID (id) | INVALID_DOMAINID | +| CREATE_BOXED_VALUE | 7 | Returns the ObjectID of the boxed value | Ask for a DomainID (id), TypeID of the type that is going to be boxed and a variant value which is going to be put into the boxed value | INVALID_DOMAINID, INVALID_TYPEID | + +The main function handling these commands is `domain_commands` and is situated at `debugger-agent.c:6104` + +#### Assembly commands + +Each command requires at least one AssemblyID (of type id) parameter before any additional parameter the command may require. + +| Name | Value | Type of reply | Additional parameters | Possible error code returned | +|:--------------------|:------|:------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------|:-----------------------------| +| GET_LOCATION | 1 | Returns the filename (string) of image associated to the assembly | None | INVALID_ASSEMBLYID | +| GET_ENTRY_POINT | 2 | Returns the MethodID (id) of the entry point or a 0 id if there is none (in case of dynamic assembly or library for instance) | None | INVALID_ASSEMBLYID | +| GET_MANIFEST_MODULE | 3 | Returns the ModuleID (id) of the assembly | None | INVALID_ASSEMBLYID | +| GET_OBJECT | 4 | Returns the ObjectID of the AssemblyID object instance | None | INVALID_ASSEMBLYID | +| GET_TYPE | 5 | Returns the TypeID of the found type or a null id if it wasn't found | Ask for a type information in form of a string and a byte value to tell if case should be ignored (1) or not (0) | INVALID_ASSEMBLYID | +| GET_NAME | 6 | Return the full name of the assembly as a string | None | INVALID_ASSEMBLYID | + +The main function handling these commands is `assembly_commands` and is situated at `debugger-agent.c:6203` + +#### Module commands + +| Name | Value | Type of reply | Additional parameters | Possible error code returned | +|:--------------------|:------|:----------------------------------------------------------------------------------------------------------------|:------------------------|:-----------------------------| +| CMD_MODULE_GET_INFO | 1 | Returns the following strings: basename of the image, scope name, full name, GUID and the image AssemblyID (id) | Ask for a ModuleID (id) | None | + +The main function handling these commands is `module_commands` and is situated at `debugger-agent.c:6295` + +#### Method commands + +Each command requires at least one MethodID (of type id) parameter before any additional parameter the command may require. + +| Name | Value | Type of reply | Additional parameters | Possible error code returned | +|:--------------------|:------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|:-----------------------------------| +| GET_NAME | 1 | Returns a string of the method name | None | INVALID_METHODID | +| GET_DECLARING_TYPE | 2 | Returns a TypeID of the declaring type for this method | None | INVALID_METHODID | +| GET_DEBUG_INFO | 3 | Returns the code size of the method (int), source file name (string) and a list of tuple of IL offset (int) and line numbers (int) for the method | None | INVALID_METHODID | +| GET_PARAM_INFO | 4 | Returns the call convention (int), parameter count (int), generic parameter count (int), TypeID of the returned value (id), *parameter count* TypeID for each parameter type and finally *parameter count* parameter name (string) for each parameter. | None | INVALID_METHODID | +| GET_LOCALS_INFO | 5 | Returns the number of locals (int) followed by the TypeID (id) for each locals, followed by the name (string) of each locals (empty string if there is none) and finally followed by the scope of each locals which is a tuple of int giving the start address and end offset. | None | INVALID_METHODID | +| GET_INFO | 6 | Returns 3 int representing respectively the method flags, implementation flags and token | None | INVALID_METHODID | +| GET_BODY | 7 | Returns a list of byte corresponding to the method IL code. | None | INVALID_METHODID | +| RESOLVE_TOKEN | 8 | Returns a variant value corresponding to the provided token | Ask for a token value (int) | INVALID_METHODID | +| GET_CATTRS | 9 | Returns the custom attributes for the methods | Method ID, attribute-type ID | INVALID_METHODID,LOADER_ERROR | +| MAKE_GENERIC_METHOD | 10 | Makes a generic version of the method | Method ID, number of type arguments (int), TypeID for each type argument (int) | INVALID_ARGUMENT, INVALID_METHODID | + +The main functions handling these commands are `method_commands` and `method_commands_internal` and are situated at `debugger-agent.c:6968` and `debugger-agent.c:6968` respectively. + +#### Type commands + +Each command requires at least one TypeID (of type id) parameter before any additional parameter the command may require. + +| Name | Value | Type of reply | Additional parameters | Possible error code returned | +|:--------------------|:------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------| +| GET_INFO | 1 | Returns the following informations about the type in that order: namespace (string), class name (string), full name (string), AssemblyID (id), ModuleID (id), TypeID (id), TypeID (id) of underlying type (or a 0 id if there is none), type token (int), type rank (byte), type flags (int), underlying byval type (byte) flags (see after table) and a list of nested type TypeID | None | INVALID_TYPEID | +| GET_METHODS | 2 | Returns a list of MethodID corresponding to each of the method of the type | None | INVALID_TYPEID | +| GET_FIELDS | 3 | Returns list of quadruplet of FieldID (id), field name (string), field TypeID (id), field attributes (int) | None | INVALID_TYPEID | +| GET_VALUES | 4 | Returns a number of variant value equals to the number of FieldID that was passed as parameter. If the field had a ThreadStatic attribute applied to it, value fetched are from the current thread point of view. | Ask for a list of FieldID representing this type static fields to the the value of. Only static field are supported. | INVALID_TYPEID, INVALID_FIELDID | +| GET_OBJECT | 5 | Returns an ObjectID corresponding to the type instance | None | INVALID_TYPEID | +| GET_SOURCE_FILES | 6 | Returns the same output than GET_SOURCE_FILES_2 except only the basename of each path is returned | None | INVALID_TYPEID | +| SET_VALUES | 7 | Returns an empty response | Ask for a list of tuple of FieldID and variant value. Only pure static field can be set (i.e. with no extra attribute like ThreadStatic). | INVALID_TYPEID, INVALID_FIELDID | +| IS_ASSIGNABLE_FROM | 8 | Returns a boolean equals to true if the type is assignable from the other provided type, false otherwise | Ask for an extra TypeID | INVALID_TYPEID | +| GET_PROPERTIES | 9 | Returns a list of quadruplet of FieldID (id), get accessor MethodID (string), set accessor MethodID (id), property attributes (int) | None | INVALID_TYPEID | +| GET_CATTRS | 10 | Returns a list of custom attribute applied on the type. Custom attribute definition is given below. | Ask for a TypeID of an custom attribute type | INVALID_TYPEID | +| GET_FIELD_CATTRS | 11 | Returns a list of custom attributes of a type's field. Custom attribute definition is given below. | Ask for a FieldID of one the type field and a TypeID of an custom attribute type | INVALID_TYPEID, INVALID_FIELDID | +| GET_PROPERTY_CATTRS | 12 | Returns a list of custom attributes of a type's property. Custom attribute definition is given below. | Ask for a PropertyID of one the type field and a TypeID of an custom attribute type | INVALID_TYPEID, INVALID_PROPERTYID | +| GET_SOURCE_FILES_2 | 13 | Returns a list of source file full paths (string) where the type is defined | None | INVALID_TYPEID | +| GET_VALUES_2 | 14 | Returns a number of variant value equals to the number of FieldID that was passed as parameter. If the field had a ThreadStatic attribute applied to it, value fetched are from the thread parameter point of view. | Ask for an ObjectID representing a System.Thread instance and a list of FieldID representing this type static fields to the the value of. Only static field are supported. | INVALID_OBJECT, INVALID_TYPEID, INVALID_FIELDID | + +The main functions handling these commands are `type_commands` and `type_commands_internal` and are situated at `debugger-agent.c:6726` and `debugger-agent.c:6403` respectively. + +Byval flags is an indication of the type attribute for a parameter when it's passed by value. A description of these flags follows: + +| byte 1 | byte 2 | byte 3 | byte 4 | byte 5 | byte 6 | byte 7 | byte 8 | +|:------------------|:--------------------|:----------------|:------------------|:-------|:-------|:-------|:-------| +| Is a pointer type | Is a primitive type | Is a value type | Is an enumeration | Unused | Unused | Unused | Unused | + +Custom attribute definition is as follows: MethodID of the attribute ctor, a list of variant objects representing the typed arguments of the attribute prepended by a length attribute (int) and another list representing named arguments of which elements are either tuple of the constant 0x53 followed by a variant value (in case the named argument is a field) or a triplet of the constant 0x54 followed by a PropertyID followed by a variant value (in case the named argument is a property). In both list case, an empty list is simply one int of value 0. + +#### Stackframe commands + +Each command requires at least one ObjectID (of type id) parameter mapping to a System.Threading.Thread instance and a FrameID (of type id) before any additional parameter the command may require. + +| Name | Value | Type of reply | Additional parameters | Possible error code returned | +|:-----------|:------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------| +| GET_VALUES | 1 | Returns a list of miscelleanous typed values. If the position information was negative, the value corresponds to a parameter and if it was positive to a local variable. | Ask for a list of position (int) information. | INVALID_OBJECT, INVALID_FRAMEID, ABSENT_INFORMATION | +| GET_THIS | 2 | Returns the *this* value prepended by a single byte value describing its type, or the special TYPE_ID_NULL (byte) value which is equal to 0xf0 in case there is no *this* parameter. | None | INVALID_OBJECT, INVALID_FRAMEID, ABSENT_INFORMATION | +| SET_VALUES | 3 | Returns an empty reply | Ask for a list of pair of position (int) information and variant whose value is going to be used. | INVALID_OBJECT, INVALID_FRAMEID, ABSENT_INFORMATION, INVALID_ARGUMENT | + +The main function handling these commands is `frame_commands` and is situated at `debugger-agent.c:7082` + +#### Array commands + +Each command requires at least one ObjectID (of type id) parameter mapping to a System.Array instance before any additional parameter the command may require. + +| Name | Value | Type of reply | Additional parameters | Possible error code returned | +|:-----------|:------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------| +| GET_LENGTH | 1 | Returns an int corresponding to the array rank followed by a set of int pair corresponding respectively to the length and lower bound of each of the array dimensions. In case of a single dimensional zero-based array, the returned data amount to 3 int values with the second being the total length of the array and the third one being 0. | None | INVALID_OBJECT | +| GET_VALUES | 2 | Returns a list of *length* elements which individual size in bytes depends on the underlying type of the System.Array instance. | Ask for an index (int) and a length (int) to determine the range of value to return | INVALID_OBJECT | +| SET_VALUES | 3 | Return an empty reply | Ask for an index (int) and a length (int) to determine the range of value to set and a *length* number of trailing values whose type and byte size match those of the underlying type of the System.Array instance. | INVALID_OBJECT | + +The main function handling these commands is `vm_commands` and is situated at `debugger-agent.c:5671` + +#### String commands + +Each command requires at least one ObjectID (of type id) parameter mapping to a System.String instance before any additional parameter the command may require. + +| Name | Value | Type of reply | Additional parameters | Possible error code returned | +|:-----------|:------|:-------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------|:---------------------------------| +| GET_VALUE | 1 | Returns a UTF8-encoded string corresponding to the System.String instance with its length prepended as a int value | None | INVALID_OBJECT | +| GET_LENGTH | 2 | Returns the length of a UTF8-encoded string corresponding to the System.String instance as an int value | None | INVALID_OBJECT | +| GET_CHARS | 3 | Returns *length* short values each encoding a character of the string slice | Ask for a start index (long) and a length parameter (long) of the string slice to take. | INVALID_OBJECT, INVALID_ARGUMENT | + +The main function handling these commands is `string_commands` and is situated at `debugger-agent.c:7293` + +#### Object commands + +Each command requires at least one ObjectID (of type id) parameter before any additional parameter the command may require. + +| Name | Value | Type of reply | Additional parameters | Possible error code returned | +|:-------------|:------|:------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------| +| GET_TYPE | 1 | Returns the TypeID as an id | None | INVALID_OBJECT | +| GET_VALUES | 2 | Returns *length* values of miscellaneous type and size corresponding to the underlying type of each queried field | Ask for a list of FieldID to fetch value of | INVALID_OBJECT, UNLOADED, INVALID_FIELDID | +| IS_COLLECTED | 3 | Returns an int equals to 1 if the object has been collected by GC, 0 otherwise | None | None | +| GET_ADDRESS | 4 | Returns a long value corresponding to the address where the object is stored in memory | None | INVALID_OBJECT | +| GET_DOMAIN | 5 | Returns an id corresponding to the DomainID the object is located in | None | INVALID_OBJECT | +| SET_VALUES | 6 | Returns an empty reply | Ask for a list of tuple of FieldID (id) and of the value that should be set to it | INVALID_OBJECT, UNLOADED, INVALID_FIELDID | + +The main function handling these commands is `object_commands` and is situated at `debugger-agent.c:7318` + +#### Composite commands + +| Name | Value | Description | +|:----------|:------|:-------------------------------------------------------------------------| +| COMPOSITE | 100 | This command is actually part of the event command set and is used for ? | + +## Differences with JDWP + +- Handshake ASCII sequence is DWP-Handshake instead of JDWP-Handshake +- Some new Mono specific command set such as AppDomain, Assembly or Module and removal/renaming of some Java specific set such as InterfaceType, ThreadGroupReference, ClassLoaderReference, etc. +- Mono SDB protocol has its own specific ID types related to the new command sets. +- SDB protocol has less error code although some are Mono-specific like "No Invocation", "Absent Informations" and "No seq point at IL offset" codes. diff --git a/docs/design/mono/web/soft-debugger.md b/docs/design/mono/web/soft-debugger.md new file mode 100644 index 0000000000000..4a9dbb36f542f --- /dev/null +++ b/docs/design/mono/web/soft-debugger.md @@ -0,0 +1,91 @@ +# Soft-Mode Debugger + +The Mono Soft Debugger is a new debugging framework for Mono. Unlike regular debuggers which act as all-knowing and controlling programs that control a separate process, the Mono Soft Debugger is actually a cooperative debugger that is built into the Mono runtime. + +Applications communicate with the Mono runtime and request debugging operations to be performed on the target process. + + The Mono Soft Debugger first became available with Mono 2.6 and is primarily used today with [Mono on the iPhone](http://monotouch.net) and is used from the [MonoDevelop IDE](http://monodevelop.com). + +Architecture +------------ + +The following diagram is useful in the discussion of the soft debugger: + +[![0911030528Mp6F5SHL.png](images/0911030528Mp6F5SHL.png)](images/0911030528Mp6F5SHL.png) + +The soft debugger lives inside the Mono runtime. Debuggers communicate with this component with a compact protocol over a socket connection. For ease of use the protocol has been encapsulated in the Mono.Debugger.Soft.dll API which different IDEs can use to communicate with the target. + +The soft debugger work both with Just-in-Time compiled code, and with [batch compiled code](/docs/advanced/aot/) allowing it to debug both regular Mono applications on a desktop, or applications on devices like the iPhone or the [PlayStation 3](/docs/about-mono/supported-platforms/playstation3/). + +### Debugger Agent + +The debugger agent is a module inside the mono runtime which offers debugging services to client programs. + +### Wire Protocol + +Clients communicate with the agent using a wire protocol over a socket transport. Read our [Soft Debugger Wire Protocol](/docs/advanced/runtime/docs/soft-debugger-wire-format/) document for details about the protocol. + +The wire protocol is inspired by the [Java Debug Wire Protocol](http://java.sun.com/j2se/1.5.0/docs/guide/jpda/jdwp-spec.html). + +### Client library + +The client library is a C# assembly which uses the wire protocol to communicate with the debugger agent running inside the mono runtime. It is based on the [Java Debug Interface](http://java.sun.com/j2se/1.5.0/docs/guide/jpda/jdi/). The assembly is named Mono.Debugger.Soft.dll, and its source is in mcs/class/Mono.Debugger.Soft. + +Implementation +-------------- + +### Agent + +The source code is in mini/debugger-agent.{h,c}. Unlike the JDWP agent in Java, the debugger agent is tightly integrated with the mono runtime because mono doesn't have a tool interface with similar capabilities as JVMTI in Java. + +#### Design + +The design for the agent was to choose solutions which were easy to implement, they can be improved later. This means that some things like step out/over can be very slow, the code generated by the JIT when debugging is enabled is larger/slower etc. + +#### The debugger thread + +The agent starts its own thread which it uses to communicate with clients using the wire protocol. + +#### Event handling + +On startup, the agent registers callbacks for events using the mono profiler interface. When a callback is called, it searches the list of event requests for a request matching the event type. If one is found, the event is sent to the client using the wire protocol. + +#### Suspend/Resume + +Suspending/Resuming the runtime is the most complex part of the debugger agent. There are many complications: - threads running managed code/native code/transitioning between the two. - threads starting up/terminating. - multiple suspend/resume operations happening in parallel. + +Threads running native code can't be suspended, because they can hold locks which are needed by the debugger and the rest of the runtime to function. So they are left running, and are only suspended when they enter managed code. We save enough state at managed-\>native transitions to be able to produce stack traces and examine the state of stack frames. However, debugger invocations are not supported on threads which are running managed code, so property evaluation is not possible on these threads. + +A suspend can be started by a normal runtime thread when it receives an event which asks for the runtime to suspend, or it can be started by the debugger thread in response to a VM.Suspend command. In contrast, a resume can only be started by the debugger thread in response to a VM.Resume command. + +Threads running managed code are suspended by turning on single stepping, and suspending the thread when it reaches the single step event handler. Threads running native code are treated as suspended. + +A suspend can be started by calling suspend_vm (), which is an async operation. This means that when the client receives an event, the runtime might not be entirely suspended yet, so code which needs the runtime to be suspended like the stack frame processing code needs to call wait_for_suspend (). After starting a suspend, the thread needs to suspend itself by calling suspend_current (). + +#### Sequence points + +A sequence point is an IL offset where the program can be stopped and its state can be examined. Currently the debugger determines sequence points automatically. A sequence point is placed at the places: + +- IL offsets where the IL stack is empty. This generally corresponds to the end of C# statements. +- IL offsets which contain the NOP IL instructions. This can be used by a compiler to insert extra sequence points, like between nested calls. +- IL offsets which have a corresponding line number entry in the .mdb file. + +The mdbdump tool in mcs/tools/mdbdump can be used to examine the line number tables inside an .mdb file. + +A sequence point is represented by the JIT opcode OP_SEQ_POINT. The JIT backends generate code from this opcode which implements single stepping/breakpoints. + +#### Single Stepping + +The implementation of single stepping is target specific. On most platforms, it is implemented by allocating a memory page and having the implementation of OP_SEQ_POINT read from that page. Single stepping is then turned on by read-protecting that page, causing the memory read to turn into a SIGSEGV or similar signal. The signal handler needs to determine whenever the signal was caused by access to this page, and if it is, transfer control to the single step handler code in the debugger agent. + +Step over/out is implemented by single stepping repeatedly until the condition becomes true (i.e. we reach a different line/parent frame). + +#### Breakpoints + +Breakpoints are usually implemented similarly to single stepping, by reading from a memory page. OP_SEQ_POINT generates a few nops to act as a placeholder, then the code to read from the trigger page is written to the JITted code when the breakpoint is enabled, and changed back to nops when the breakpoint is disabled. + +#### AOT support + +AOTed code can be debugged by compiling it with the 'soft-debug' aot option, i.e: mono --debug --aot=soft-debug foo.dll + +In the AOT case, the code can'be be patched at runtime, so breakpoints are implemented by reading from per-method table with one entry per sequence point, which is either NULL or points to the breakpoint trigger page. diff --git a/docs/design/mono/web/thread-safety.md b/docs/design/mono/web/thread-safety.md new file mode 100644 index 0000000000000..6449866acff18 --- /dev/null +++ b/docs/design/mono/web/thread-safety.md @@ -0,0 +1,129 @@ +# Thread Safety/Synchronization + +Thread safety of metadata structures +------------------------------------ + +### Synchronization of read-only data + +Read-only data is data which is not modified after creation, like the actual binary metadata in the metadata tables. + +There are three kinds of threads with regards to read-only data: + +- readers +- the creator of the data +- the destroyer of the data + +Most threads are readers. + +- synchronization between readers is not necessary +- synchronization between the writers is done using locks. +- synchronization between the readers and the creator is done by not exposing the data to readers before it is fully constructed. +- synchronization between the readers and the destroyer: TBD. + +### Deadlock prevention plan + +Hold locks for the shortest time possible. Avoid calling functions inside locks which might obtain global locks (i.e. locks known outside this module). + +### Locks + +#### Simple locks + +There are a lot of global data structures which can be protected by a 'simple' lock. Simple means: + +- the lock protects only this data structure or it only protects the data structures in a given C module. An example would be the appdomains list in domain.c +- the lock can span many modules, but it still protects access to a single resource or set of resources. An example would be the image lock, which protects all data structures that belong to a given MonoImage. +- the lock is only held for a short amount of time, and no other lock is acquired inside this simple lock. Thus there is no possibility of deadlock. + +Simple locks include, at least, the following : + +- the per-image lock acquired by using mono_image_(un)lock functions. +- the threads lock acquired by using mono_threads_(un)lock. + +#### The loader lock + +This locks is held by class loading routines and any global synchronization routines. This is effectively the runtime global lock. Other locks can call code that acquire the loader lock out of order if the current thread already owns it. + +#### The domain lock + +Each appdomain has a lock which protects the per-domain data structures. + +#### The domain jit code hash lock + +This per-domain lock protects the JIT'ed code of each domain. Originally we used the domain lock, but it was split to reduce contention. + +#### Allocation locks and foreign locks + +Mono features a few memory allocation subsystems such as: a lock-free allocator, the GC. Those subsystems are designed so they don't rely on any of the other subsystems in the runtime. This ensures that locking within them is transparent to the rest of the runtime and are not covered here. It's the same rule when dealing with locking that happens within libc. + +### The locking hierarchy + +It is useful to model locks by a locking hierarchy, which is a relation between locks, which is reflexive, transitive, and antisymmetric, in other words, a lattice. If a thread wants to acquire a lock B, while already holding A, it can only do it if A \< B. If all threads work this way, then no deadlocks can occur. + +Our locking hierarchy so far looks like this (if lock A is above lock B, then A \< B): + + + \ + + \ \ \ + + +For example: if a thread wants to hold a domain jit lock, a domain lock and the loader lock, it must acquire them in the order: loader lock, domain lock, domain jit lock. + +### Notes + +Some common scenarios: + +- if a function needs to access a data structure, then it should lock it itself, and do not count on its caller locking it. So for example, the image-\>class_cache hash table would be locked by mono_class_get(). + +- there are lots of places where a runtime data structure is created and stored in a cache. In these places, care must be taken to avoid multiple threads creating the same runtime structure, for example, two threads might call mono_class_get () with the same class name. There are two choices here: + + + + + + if (created) { + + return item + } + + + + +This is the easiest solution, but it requires holding the lock for the whole time which might create a scalability problem, and could also lead to deadlock. + + + + + if (created) { + return item + } + + + + if (created) { + /* Another thread already created and stored the same item */ + + + return orig item + } + else { + + + return item + } + +This solution does not present scalability problems, but the created item might be hard to destroy (like a MonoClass). If memory is allocated from a mempool, that memory is leaked, but the leak is very rare and it is bounded. + +- lazy initialization of hashtables etc. is not thread safe + +[Original version of this document in git](https://github.com/mono/mono/blob/8f91e420d7fbbab7da758e57160d1d762129f38a/docs/thread-safety.txt) + +### The Lock Tracer + +Mono now have a lock tracer that allows to record the locking behavior of the runtime during execution and later verify it's correctness. + +To enable lock tracer support define LOCK_TRACER in mono/mono/metadata/lock-tracer.h and recompile mono. To enable it at runtime define the MONO_ENABLE_LOCK_TRACER environment variable. + +The lock tracer produces a file in the same directory of the application, it's named 'lock.ZZZ' where ZZZ is the pid of the mono process. + +After producing such lock file, run the trace decoder that can be found in mono/data/lock-decoder. It currently only works on linux and macOS, it requires binutils to be installed. The decoder will report locking errors specifying the functions that caused it. diff --git a/docs/design/mono/web/trampolines.md b/docs/design/mono/web/trampolines.md new file mode 100644 index 0000000000000..a1ad2b70b5b38 --- /dev/null +++ b/docs/design/mono/web/trampolines.md @@ -0,0 +1,75 @@ +# Trampolines + +Trampolines are small, hand-written pieces of assembly code used to perform various tasks in the mono runtime. They are generated at runtime using the native code generation macros used by the JIT. They usually have a corresponding C function they can fall back to if they need to perform a more complicated task. They can be viewed as ways to pass control from JITted code back to the runtime. + +The common code for all architectures is in mini-trampolines.c, this file contains the trampoline creation functions plus the C functions called by the trampolines. The tramp-\.c files contain the arch-dependent code which creates the trampolines themselves. + +Most, but not all trampolines consist of two parts: + +- a generic part containing most of the code. This is created by the mono_arch_create_trampoline_code () function in tramp-\.c. Generic trampolines can be large (1kb). +- a specific part whose job is to call the generic part, passing in a parameter. The parameter to pass and the method by it is passed depends on the type of the trampoline. Specific trampolines are created by the mono_arch_create_specific_trampoline () function in tramp-\.c. Specific trampolines are small, since the runtime creates lots of them. + +The generic part saves the machine state to the stack, and calls one of the trampoline functions in mini-trampolines.c with the state, the call site, and the argument passed by the specific trampoline. After the C function returns, it either returns normally, or branches to the address returned by the C function, depending on the trampoline type. + +Trampoline types are given by the MonoTrampolineType enumeration in [mini.h](https://github.com/mono/mono/blob/main/mono/mini/mini.h). + +The platform specific code for trampolines is in the file tramp-\.c for each architecture, while the cross platform code is in mini-trampolines.c. There are two types of functions in mini-trampolines.c: + +- The actual C functions called by the trampolines. +- Functions to create the different trampolines types. + +Trampoline creation functions have the following signature: + +``` bash +gpointer +mono_arch_create_foo_trampoline (, MonoTrampInfo **info, gboolean aot) +``` + +The function should return a pointer to the newly created trampoline, allocating memory from either the global code manager, or from a domain's code manager. If INFO is not NULL, it is set to a pointer to a MonoTrampInfo structure, which contains information about the trampoline, like its name, unwind info, etc. This is used for two purposes: + +- Saving the trampoline info an AOT image in 'full-aot' mode. +- Saving debug info about the trampoline in XDEBUG mode. + +### JIT Trampolines + +These trampolines are used to JIT compile a method the first time it is called. When the JIT compiles a call instruction, it doesn't compile the called method right away. Instead, it creates a JIT trampoline, and emits a call instruction referencing the trampoline. When the trampoline is called, it calls mono_magic_trampoline () which compiles the target method, and returns the address of the compiled code to the trampoline which branches to it. This process is somewhat slow, so mono_magic_trampoline () tries to patch the calling JITted code so it calls the compiled code instead of the trampoline from now on. This is done by mono_arch_patch_callsite () in tramp-\.c. + +### Virtual Call Trampolines + +There is one virtual call trampoline per vtable slot index. The trampoline uses this index plus the 'this' argument which is passed in a fixed register/stack slots by the managed calling convention to obtain the virtual method which needs to be compiled. It then patches the vtable slot with the address of the newly compiled method. + +\ + +### Jump Trampolines + +Jump trampolines are very similar to JIT trampolines, they even use the same mono_magic_trampoline () C function. They are used to implement the LDFTN and the JMP IL opcodes. + +### Class Init Trampolines + +These trampolines are used to implement the type initialization sematics of the CLI spec. They call the mono_class_init_trampoline () C function which executes the class initializer of the class passed as the trampoline argument, then replaces the code calling the class init trampoline with NOPs so it is not executed anymore. + +### Generic Class Init Trampoline + +This is similar to the class init trampolines, but is used for initalizing classes which are only known at run-time, in generic-shared code. It receives the class to be initialized in a register instead of from a specific trampoline. This means there is only one instance of this trampoline. + +### RGCTX Lazy Fetch Trampolines + +These are used for fetching values from a runtime generic context, lazily initializing the values if they do not exist yet. There is one instance of this trampoline for each offset value. + +### AOT Trampolines + +These are similar to the JIT trampolines but instead of receiving a MonoMethod to compile, they receive an image+token pair. If the method identified by this pair is also AOT compiled, the address of its compiled code can be obtained without loading the metadata for the method. + +### AOT PLT Trampolines + +These trampolines handle calls made from AOT code though the PLT. + +### Delegate Trampolines + +These trampolines are used to handle the first call made to the delegate though its Invoke method. They call mono_delegate_trampoline () which creates a specialized calling sequence optimized to the delegate instance before calling it. Further calls will go through to this optimized code sequence. + +### Monitor Enter/Exit Trampolines + +These trampolines implement the fastpath of Monitor.Enter/Exit on some platforms. + +\