-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support restartable regions for per-cpu critical regions #2350
Comments
This is finally coming to the official kernel: torvalds/linux@d82991a |
The best approach that we've come up with to support instrumentation of restartable regions is to run them twice: first, run them instrumentated in the code cache, but with their memory stores removed (but still instrumented); then, at the committing store point, invoke the sequence from its start point natively. This requires some assumptions on identifying the committing store (this actually gets easier with the implementation being pushed to mainline Linux) and is not ideal with mixing in native execution, but it is much faster than the 2nd-best approach of serializing all restartable sequences that operate on the same data structures (which requires app knowledge...else lumping all sequences together) with a global lock, which also has as assumption that the current cpu is not read twice during the sequence. |
Xref the #1698 load-store exclusive problem which is from a high level a similar type of platform issue complicating instrumentation by restricting what can happen within a PC range. |
I wrote a doc as a summary of the current state of the world: https://github.com/DynamoRIO/dynamorio/wiki/Restartable-Sequences |
Please refer to the two following upstream Linux selftests commits, which are relevant for DR. They replace the __rseq_table with a more complete alternative: commit 4fe2088e164d2ec44530fe2840f6be5906fbc650
commit a3e3131f94aa1daeb978ed66d0b4e61156ef2c2a
|
For the application module, we use the application path obtained from early injection or /proc/self/exe on Linux, rather than /proc/self/maps comments. The maps comments can be unreliable in the face of anonymous or deleted-file mremaps used for hugepage backing and other features. Adds a test case to the existing "burst_maps" test. Having the right module full_path helps many cases, including the forthcoming restartable sequences ("rseq") support for #2350. Issue: #2566, #2350
For the application module, we use the application path obtained from early injection or /proc/self/exe on Linux, rather than /proc/self/maps comments. The maps comments can be unreliable in the face of anonymous or deleted-file mremaps used for hugepage backing and other features. Adds a test case to the existing "burst_maps" test. Having the right module full_path helps many cases, including the forthcoming restartable sequences ("rseq") support for #2350. Issue: #2566, #2350
Adds initial handling for the restartable sequence ("rseq") feature that is now in the mainline Linux kernel. We identify rseq regions by looking for ELF sections with established names according to upstream conventions. Unfortunately this requires going to disk for most libraries, so we avoid this for full-control-mode if we have never seen an rseq system call, and for attach if no thread has registered for rseq. For blocks inside rseq regions, mangling removes all memory stores. For the final commit instruction, we append a native call back to the abort handler. We assume this extra frame is ok, and we require the rseq sequence to end in a return. Future work will improve these assumptions. Updates the 3 Linux syscall lists up through SYS_rseq. Adds 3 RSTATS for rseq operation. Documents the current limitations on rseq region support: - The application must store an rseq_cs struct for each rseq region in a section of its binary with an established name. - Each rseq region's code must never be also executed as a non-restartable sequence. - Each rseq region must make forward progress if its abort handler is always called the first time it is executed. - Each memory store instruction inside an rseq region must have no other side effects. - Each rseq region must end with a return instruction, and each abort handler plus rseq code must combine into a callee following normal call-return semantics. - Any helper function called from within an rseq region must have no side effects. Adds two tests for x86_64 Linux, one for full control and one for attach. However, these require a 4.18+ kernel and so are not exercised by Travis. The Fedora CDash machine does have 4.18 so we do have some automated coverage. Once this is in place, the old and now obsolete rseq support will be removed. Issue: #2350
Adds initial handling for the restartable sequence ("rseq") feature that is now in the mainline Linux kernel. We identify rseq regions by looking for ELF sections with established names according to upstream conventions. Unfortunately this requires going to disk for most libraries, so we avoid this for full-control-mode if we have never seen an rseq system call, and for attach if no thread has registered for rseq. For blocks inside rseq regions, mangling removes all memory stores. For the final commit instruction, we append a native call back to the abort handler. We assume this extra frame is ok, and we require the rseq sequence to end in a return. Future work will improve these assumptions. Updates the 3 Linux syscall lists up through SYS_rseq. Adds 3 RSTATS for rseq operation. Documents the current limitations on rseq region support: - The application must store an rseq_cs struct for each rseq region in a section of its binary with an established name. - Each rseq region's code must never be also executed as a non-restartable sequence. - Each rseq region must make forward progress if its abort handler is always called the first time it is executed. - Each memory store instruction inside an rseq region must have no other side effects. - Each rseq region must end with a return instruction, and each abort handler plus rseq code must combine into a callee following normal call-return semantics. - Any helper function called from within an rseq region must have no side effects. Adds two tests for x86_64 Linux, one for full control and one for attach. However, these require a 4.18+ kernel and so are not exercised by Travis. The Fedora CDash machine does have 4.18 so we do have some automated coverage. Once this is in place, the old and now obsolete rseq support will be removed. Issue: #2350
Fixes the lazy rseq support to handle code cache pre-population. Previously rseq code blocks could be created without rseq handling due to the lazy checks not triggered until after pre-population. Issue: #2350
Fixes the lazy rseq support to handle code cache pre-population between setup and start. Previously rseq code blocks could be created without rseq handling due to the lazy checks not triggering until after taking over the app. Issue: #2350
The __rseq_cs_ptr_array will be relocated, so we should not add the load offset. Adds an array to the suite test (previously arrays were only tested manually using a librseq app). Creates 2 separate tests to test all 3 section types. Issue: #2350
The __rseq_cs_ptr_array will be relocated, so we should not add the load_offset but rather the entry_offset. Documents that we require these rseq sections to be located in loaded segments. Adds release-build fatal errors if this is not the case. Adds an array to the suite test (previously arrays were only tested manually using a librseq app). Creates 2 separate tests to test all 3 section types. Issue: #2350
The __rseq_cs_ptr_array will be relocated, so we should not add the load_offset but rather the entry_offset. Documents that we require these rseq sections to be located in loaded segments. Adds release-build fatal errors if this is not the case. Adds an array to the suite test (previously arrays were only tested manually using a librseq app). Creates 2 separate tests to test all 3 section types. Issue: #2350
On any translation, and in particular on detach, we translate from inside an rseq region to the abort handler. This is necessary to avoid problems with a cpu migration earlier in the region while running the instrumented version. Augments the api.rseq test with a thread that sits in a loop in an rseq region to test translation on detach: without the translation, it loops forever. Issue: #2350
On any translation, and in particular on detach, we translate from inside an rseq region to the abort handler. This is necessary to avoid problems with a cpu migration earlier in the region while running the instrumented version. Augments the api.rseq test with a thread that sits in a loop in an rseq region to test translation on detach: without the translation, it loops forever. Issue: #2350
On any translation, and in particular on detach, we translate from inside an rseq region to the abort handler. This is necessary to avoid problems with a cpu migration earlier in the region while running the instrumented version. Augments the api.rseq test with a thread that sits in a loop in an rseq region to test translation on detach: without the translation, it loops forever. Issue: #2350
Adds a new option -disable_rseq, which returns -ENOSYS on any SYS_rseq system call. This is intended as a workaround for applications that do not satisfy DR's limitations for full rseq support. Adds a test that fails unless -disable_rseq is passed. Moves the rseq limitations list to a new section on rseq in the documentation. Issue: #2350
Eliminates the call-return reliance for the native execution step of rseq support. Makes a local copy of the sequence right inside the sequence-ending block and executes it. The sequence is inserted as additional instructions and is then mangled normally (mangling changes are assumed to be restartable), but it is not passed to clients. Any exits are regular block exits, resulting in a block with many exits. The prior call-return scheme is left under a temporary option -rseq_assume_call, as a failsafe in case there are stability problems discovered with this native execution implementation. Once we are happy with the new scheme we can remove the option. To make the local copy an rseq region, the per-thread rseq_cs address is identified by watching system calls. For attach, it is identified by searching the possible static TLS offsets. The assumption of a constant offset is documented and verified. The rseq_cs's abort handler is a new exit added with the app's signature as data just before it, hidden in the operands of a nop instruction to avoid problems with decoding the fragment. A local jump skips over the data and exit. A new rseq_cs structure is allocated for each sequence-ending fragment. It is stored in a hashtable in the rseq module, to avoid complexities and overhead of adding an additional fragment_t or "subclass" field. A new flag is set to trigger calling into the rseq module on fragment deletion. The rseq_cs fields are filled in via a new post-emit control point, using information stored in labels during mangling. The pointer to the rseq_cs is inserted with a dummy value and patched in this new control point using a new utility routine patch_mov_immed_ptrsz(). To avoid crashing due to invalid rseq bounds after freeing the rseq_cs structure, the rseq pointer is cleared explicitly on completion, and on midpoint exit by the fragment deletion hook along with a hook on the shared fragment flushtime update, to ensure all threads are covered. The rseq test is augmented and expanded. An invalid instruction is added to properly test the abort handler, under a conditional to allow testing each sequence both to completion and on abort. Future work is properly handling a midpoint exit during the instrumentation execution: we need to invoke the native version as well. Adding aarchxx support is also future work: the patch_mov_immed_ptrsz(), the writes to the rseq struct in TLS, and the rseq tests are currently x86-only. Issue: #2350
Eliminates the call-return reliance for the native execution step of rseq support. Makes a local copy of the sequence right inside the sequence-ending block and executes it. The sequence is inserted as additional instructions and is then mangled normally (mangling changes are assumed to be restartable), but it is not passed to clients. Any exits are regular block exits, resulting in a block with many exits. The prior call-return scheme is left under a temporary option -rseq_assume_call, as a failsafe in case there are stability problems discovered with this native execution implementation. Once we are happy with the new scheme we can remove the option. To make the local copy an rseq region, the per-thread rseq_cs address is identified by watching system calls. For attach, it is identified by searching the possible static TLS offsets. The assumption of a constant offset is documented and verified. The rseq_cs's abort handler is a new exit added with the app's signature as data just before it, hidden in the operands of a nop instruction to avoid problems with decoding the fragment. A local jump skips over the data and exit. A new rseq_cs structure is allocated for each sequence-ending fragment. It is stored in a hashtable in the rseq module, to avoid complexities and overhead of adding an additional fragment_t or "subclass" field. A new flag is set to trigger calling into the rseq module on fragment deletion. The rseq_cs fields are filled in via a new post-emit control point, using information stored in labels during mangling. The pointer to the rseq_cs is inserted with a dummy value and patched in this new control point using a new utility routine patch_mov_immed_ptrsz(). To avoid crashing due to invalid rseq bounds after freeing the rseq_cs structure, the rseq pointer is cleared explicitly on completion, and on midpoint exit by the fragment deletion hook along with a hook on the shared fragment flushtime update, to ensure all threads are covered. The rseq test is augmented and expanded. An invalid instruction is added to properly test the abort handler, under a conditional to allow testing each sequence both to completion and on abort. Future work is properly handling a midpoint exit during the instrumentation execution: we need to invoke the native version as well. Adding aarchxx support is also future work: the patch_mov_immed_ptrsz(), the writes to the rseq struct in TLS, and the rseq tests are currently x86-only. Issue: #2350
https://github.com/DynamoRIO/dynamorio/wiki/Restartable-Sequences contains a writeup of the implementation details in the series of commits above for the run-twice solution. There are a number of corner cases left to cover, but they are lower priority. These are things like:
|
We are currently ignoring the flags which indicate which triggers (out of preempt, signal, and migrate) should cause the abort handler to be called. We blindly run a second time even the preempt and migrate bits are not set, which the application may not expect without a signal arriving or may expect to only happen in a fatal error condition. |
For i#731 with automatic re-relativization of absolute PC's, in d6f5fca we simply kept the hardcoded offset for intra-region branch targets in our native rseq copy. However, with subsequent mangling that offset can become incorrect and target the middle of an instruction, leading to a crash. We instead take the time to convert these PC targets to instr_t* targets. We also tweak the disassembly output to show the instr_t pointer value for level 3 instructions too, since jumps can target them as well as synthetic instructions. This helped with verifying and debugging this change. Tested on an inserted system call for locally forcing rseq restarts, which leads to system call mangling and crashes without this fix. Issue: #731, #2350
For i#731 with automatic re-relativization of absolute PC's, in d6f5fca we simply kept the hardcoded offset for intra-region branch targets in our native rseq copy. However, with subsequent mangling that offset can become incorrect and target the middle of an instruction, leading to a crash. We instead take the time to convert these PC targets to instr_t* targets. We also tweak the disassembly output to show the instr_t pointer value for level 3 instructions too, since jumps can target them as well as synthetic instructions. This helped with verifying and debugging this change. Tested on an inserted system call for locally forcing rseq restarts, which leads to system call mangling and crashes without this fix. Issue: #731, #2350
Adds translation support for the register restores used in rseq mangling. Adds a test of a fault/signal in native rseq code by taking advantage of the lack of xmm support to have different behavior in the instrumented vs native executions. I hit this while trying to force a restart for i#4019 in a custom test, but it could happen in regular execution with an asynchronous signal. Issue: #2350
Adds translation support for the register restores used in rseq mangling. Adds a test of a fault/signal in native rseq code by taking advantage of the lack of xmm support to have different behavior in the instrumented vs native executions. I hit this while trying to force a restart for i#4019 in a custom test, but it could happen in regular execution with an asynchronous signal. Issue: #2350
Updates a now-stale detail in the rseq limitation docs: we no longer try to analyze read-write sequences for restoring state for the second rseq execution. We do still limit our checkpointing to general-purpose registers. Issue: #2350
Updates a now-stale detail in the rseq limitation docs: we no longer try to analyze read-write sequences for restoring state for the second rseq execution. We do still limit our checkpointing to general-purpose registers. Issue: #2350
When a migration or context switch happens during rseq native execution, we now raise a kernel xfer event. The event is of a new type DR_XFER_RSEQ_ABORT. To implement this, the native abort handler cannot be linked and must return to dispatch. The special-exit-reason feature is used for this purpose. Adds a test. To force a migration we use a system call, which we do not normally allow inside an rseq region. I added a debug-build exception for this particular test by executable name, along with a syscall discovery workaround for the attach test. Adds a client via static DR to api.rseq to test that the event is raised. Adds handling to drmemtrace in the tracer and raw2trace. For raw2trace we walk backward to undo the committing store that was recorded, since a real rseq abort would happen before the final store. I would like to add on offline trace rseq regression test, but it hits Issue; #2350, #4019, #4041 Fixes #4019
When a migration or context switch happens during rseq native execution, we now raise a kernel xfer event. The event is of a new type DR_XFER_RSEQ_ABORT. To implement this, the native abort handler cannot be linked and must return to dispatch. The special-exit-reason feature is used for this purpose. Adds a test. To force a migration we use a system call, which we do not normally allow inside an rseq region. I added a debug-build exception for this particular test by executable name, along with a syscall discovery workaround for the attach test. Adds a client via static DR to api.rseq to test that the event is raised. Adds handling to drmemtrace in the tracer and raw2trace. For raw2trace we walk backward to undo the committing store that was recorded, since a real rseq abort would happen before the final store. I would like to add on offline trace rseq regression test, but it hits #4041 and so the test will be added as part of that issue. Issue; #2350, #4019, #4041 Fixes #4019
I'm marking this as completed since it is working well in practice. |
Removes the option -rseq_assume_call and its code in favor of the more-general native-copy approach, which has been the default for a while now and has not shown any stability issues. Issue: #2350
Removes the option -rseq_assume_call and its code in favor of the more-general native-copy approach, which has been the default for a while now and has not shown any stability issues. Issue: #2350
Clarifies when instrumentation will see an rseq abort. Fixes a typo. Issue: #2350
Clarifies when instrumentation will see an rseq abort. Fixes a typo. Issue: #2350
This issue covers support for Linux kernel extensions for restartable regions for per-cpu critical regions. Xref https://lwn.net/Articles/649288/
The text was updated successfully, but these errors were encountered: