support restartable regions for per-cpu critical regions #2350

derekbruening · 2017-04-14T16:51:43Z

This issue covers support for Linux kernel extensions for restartable regions for per-cpu critical regions. Xref https://lwn.net/Articles/649288/

derekbruening · 2018-06-18T19:00:04Z

This is finally coming to the official kernel: torvalds/linux@d82991a

derekbruening · 2018-06-18T20:32:23Z

The best approach that we've come up with to support instrumentation of restartable regions is to run them twice: first, run them instrumentated in the code cache, but with their memory stores removed (but still instrumented); then, at the committing store point, invoke the sequence from its start point natively. This requires some assumptions on identifying the committing store (this actually gets easier with the implementation being pushed to mainline Linux) and is not ideal with mixing in native execution, but it is much faster than the 2nd-best approach of serializing all restartable sequences that operate on the same data structures (which requires app knowledge...else lumping all sequences together) with a global lock, which also has as assumption that the current cpu is not read twice during the sequence.

derekbruening · 2018-06-18T20:35:18Z

Xref the #1698 load-store exclusive problem which is from a high level a similar type of platform issue complicating instrumentation by restricting what can happen within a PC range.

Carrotman42 · 2019-02-07T20:16:04Z

I wrote a doc as a summary of the current state of the world: https://github.com/DynamoRIO/dynamorio/wiki/Restartable-Sequences

compudj · 2019-06-09T20:37:00Z

Please refer to the two following upstream Linux selftests commits, which are relevant for DR. They replace the __rseq_table with a more complete alternative:

commit 4fe2088e164d2ec44530fe2840f6be5906fbc650
Author: Mathieu Desnoyers [email protected]
Date: Mon Apr 29 11:27:53 2019 -0400

rseq/selftests: Add __rseq_exit_point_array section for debuggers

Knowing all exit points is useful to assist debuggers stepping over the
rseq critical sections without requiring them to disassemble the content
of the critical section to figure out the exit points.

commit a3e3131f94aa1daeb978ed66d0b4e61156ef2c2a
Author: Mathieu Desnoyers [email protected]
Date: Mon Apr 29 11:27:54 2019 -0400

rseq/selftests: Introduce __rseq_cs_ptr_array, rename __rseq_table to __rseq_cs

The entries within __rseq_table are aligned on 32 bytes due to
linux/rseq.h struct rseq_cs uapi requirements, but the start of the
__rseq_table section is not guaranteed to be 32-byte aligned. It can
cause padding to be added at the start of the section, which makes it
hard to use as an array of items by debuggers.

Considering that __rseq_table does not really consist of a table due to
the presence of padding, rename this section to __rseq_cs.

Create a new __rseq_cs_ptr_array section which contains 64-bit packed
pointers to entries within the __rseq_cs section.

For the application module, we use the application path obtained from early injection or /proc/self/exe on Linux, rather than /proc/self/maps comments. The maps comments can be unreliable in the face of anonymous or deleted-file mremaps used for hugepage backing and other features. Adds a test case to the existing "burst_maps" test. Having the right module full_path helps many cases, including the forthcoming restartable sequences ("rseq") support for #2350. Issue: #2566, #2350

Adds initial handling for the restartable sequence ("rseq") feature that is now in the mainline Linux kernel. We identify rseq regions by looking for ELF sections with established names according to upstream conventions. Unfortunately this requires going to disk for most libraries, so we avoid this for full-control-mode if we have never seen an rseq system call, and for attach if no thread has registered for rseq. For blocks inside rseq regions, mangling removes all memory stores. For the final commit instruction, we append a native call back to the abort handler. We assume this extra frame is ok, and we require the rseq sequence to end in a return. Future work will improve these assumptions. Updates the 3 Linux syscall lists up through SYS_rseq. Adds 3 RSTATS for rseq operation. Documents the current limitations on rseq region support: - The application must store an rseq_cs struct for each rseq region in a section of its binary with an established name. - Each rseq region's code must never be also executed as a non-restartable sequence. - Each rseq region must make forward progress if its abort handler is always called the first time it is executed. - Each memory store instruction inside an rseq region must have no other side effects. - Each rseq region must end with a return instruction, and each abort handler plus rseq code must combine into a callee following normal call-return semantics. - Any helper function called from within an rseq region must have no side effects. Adds two tests for x86_64 Linux, one for full control and one for attach. However, these require a 4.18+ kernel and so are not exercised by Travis. The Fedora CDash machine does have 4.18 so we do have some automated coverage. Once this is in place, the old and now obsolete rseq support will be removed. Issue: #2350

Reverts the now-obsolete run-native approach for an older version of the restartable sequence ("rseq") feature. That version never made it to the mainline kernel, and the run-native approach failed to allow tools to see rseq code. Reverts most of commits cda88be and 0935136. Issue: #2350

Fixes the lazy rseq support to handle code cache pre-population. Previously rseq code blocks could be created without rseq handling due to the lazy checks not triggered until after pre-population. Issue: #2350

Fixes the lazy rseq support to handle code cache pre-population between setup and start. Previously rseq code blocks could be created without rseq handling due to the lazy checks not triggering until after taking over the app. Issue: #2350

The __rseq_cs_ptr_array will be relocated, so we should not add the load offset. Adds an array to the suite test (previously arrays were only tested manually using a librseq app). Creates 2 separate tests to test all 3 section types. Issue: #2350

The __rseq_cs_ptr_array will be relocated, so we should not add the load_offset but rather the entry_offset. Documents that we require these rseq sections to be located in loaded segments. Adds release-build fatal errors if this is not the case. Adds an array to the suite test (previously arrays were only tested manually using a librseq app). Creates 2 separate tests to test all 3 section types. Issue: #2350

On any translation, and in particular on detach, we translate from inside an rseq region to the abort handler. This is necessary to avoid problems with a cpu migration earlier in the region while running the instrumented version. Augments the api.rseq test with a thread that sits in a loop in an rseq region to test translation on detach: without the translation, it loops forever. Issue: #2350

Adds a new option -disable_rseq, which returns -ENOSYS on any SYS_rseq system call. This is intended as a workaround for applications that do not satisfy DR's limitations for full rseq support. Adds a test that fails unless -disable_rseq is passed. Moves the rseq limitations list to a new section on rseq in the documentation. Issue: #2350

Eliminates the call-return reliance for the native execution step of rseq support. Makes a local copy of the sequence right inside the sequence-ending block and executes it. The sequence is inserted as additional instructions and is then mangled normally (mangling changes are assumed to be restartable), but it is not passed to clients. Any exits are regular block exits, resulting in a block with many exits. The prior call-return scheme is left under a temporary option -rseq_assume_call, as a failsafe in case there are stability problems discovered with this native execution implementation. Once we are happy with the new scheme we can remove the option. To make the local copy an rseq region, the per-thread rseq_cs address is identified by watching system calls. For attach, it is identified by searching the possible static TLS offsets. The assumption of a constant offset is documented and verified. The rseq_cs's abort handler is a new exit added with the app's signature as data just before it, hidden in the operands of a nop instruction to avoid problems with decoding the fragment. A local jump skips over the data and exit. A new rseq_cs structure is allocated for each sequence-ending fragment. It is stored in a hashtable in the rseq module, to avoid complexities and overhead of adding an additional fragment_t or "subclass" field. A new flag is set to trigger calling into the rseq module on fragment deletion. The rseq_cs fields are filled in via a new post-emit control point, using information stored in labels during mangling. The pointer to the rseq_cs is inserted with a dummy value and patched in this new control point using a new utility routine patch_mov_immed_ptrsz(). To avoid crashing due to invalid rseq bounds after freeing the rseq_cs structure, the rseq pointer is cleared explicitly on completion, and on midpoint exit by the fragment deletion hook along with a hook on the shared fragment flushtime update, to ensure all threads are covered. The rseq test is augmented and expanded. An invalid instruction is added to properly test the abort handler, under a conditional to allow testing each sequence both to completion and on abort. Future work is properly handling a midpoint exit during the instrumentation execution: we need to invoke the native version as well. Adding aarchxx support is also future work: the patch_mov_immed_ptrsz(), the writes to the rseq struct in TLS, and the rseq tests are currently x86-only. Issue: #2350

derekbruening · 2019-09-17T21:04:30Z

https://github.com/DynamoRIO/dynamorio/wiki/Restartable-Sequences contains a writeup of the implementation details in the series of commits above for the run-twice solution.

There are a number of corner cases left to cover, but they are lower priority. These are things like:

Invoking the 2nd execution on a sequence midpoint exit (maybe leveraging __rseq_exit_point_array)
Handling indirect branch exits out of a sequence
Remove the -rseq_assume_call option once we're satisfied we'll never go back to it
Add more sanity checks for existing requirements/assumptions

derekbruening · 2019-09-17T22:13:45Z

We are currently ignoring the flags which indicate which triggers (out of preempt, signal, and migrate) should cause the abort handler to be called. We blindly run a second time even the preempt and migrate bits are not set, which the application may not expect without a signal arriving or may expect to only happen in a fatal error condition.

For i#731 with automatic re-relativization of absolute PC's, in d6f5fca we simply kept the hardcoded offset for intra-region branch targets in our native rseq copy. However, with subsequent mangling that offset can become incorrect and target the middle of an instruction, leading to a crash. We instead take the time to convert these PC targets to instr_t* targets. We also tweak the disassembly output to show the instr_t pointer value for level 3 instructions too, since jumps can target them as well as synthetic instructions. This helped with verifying and debugging this change. Tested on an inserted system call for locally forcing rseq restarts, which leads to system call mangling and crashes without this fix. Issue: #731, #2350

Adds translation support for the register restores used in rseq mangling. Adds a test of a fault/signal in native rseq code by taking advantage of the lack of xmm support to have different behavior in the instrumented vs native executions. I hit this while trying to force a restart for i#4019 in a custom test, but it could happen in regular execution with an asynchronous signal. Issue: #2350

Updates a now-stale detail in the rseq limitation docs: we no longer try to analyze read-write sequences for restoring state for the second rseq execution. We do still limit our checkpointing to general-purpose registers. Issue: #2350

When a migration or context switch happens during rseq native execution, we now raise a kernel xfer event. The event is of a new type DR_XFER_RSEQ_ABORT. To implement this, the native abort handler cannot be linked and must return to dispatch. The special-exit-reason feature is used for this purpose. Adds a test. To force a migration we use a system call, which we do not normally allow inside an rseq region. I added a debug-build exception for this particular test by executable name, along with a syscall discovery workaround for the attach test. Adds a client via static DR to api.rseq to test that the event is raised. Adds handling to drmemtrace in the tracer and raw2trace. For raw2trace we walk backward to undo the committing store that was recorded, since a real rseq abort would happen before the final store. I would like to add on offline trace rseq regression test, but it hits Issue; #2350, #4019, #4041 Fixes #4019

When a migration or context switch happens during rseq native execution, we now raise a kernel xfer event. The event is of a new type DR_XFER_RSEQ_ABORT. To implement this, the native abort handler cannot be linked and must return to dispatch. The special-exit-reason feature is used for this purpose. Adds a test. To force a migration we use a system call, which we do not normally allow inside an rseq region. I added a debug-build exception for this particular test by executable name, along with a syscall discovery workaround for the attach test. Adds a client via static DR to api.rseq to test that the event is raised. Adds handling to drmemtrace in the tracer and raw2trace. For raw2trace we walk backward to undo the committing store that was recorded, since a real rseq abort would happen before the final store. I would like to add on offline trace rseq regression test, but it hits #4041 and so the test will be added as part of that issue. Issue; #2350, #4019, #4041 Fixes #4019

derekbruening · 2020-06-04T18:57:50Z

I'm marking this as completed since it is working well in practice.
I split the corner cases listed in #2350 (comment) off to #4315. I filed #4316 on adding aarch64 support.

Removes the option -rseq_assume_call and its code in favor of the more-general native-copy approach, which has been the default for a while now and has not shown any stability issues. Issue: #2350

Clarifies when instrumentation will see an rseq abort. Fixes a typo. Issue: #2350

derekbruening added OpSys-Linux Type-Feature labels Apr 14, 2017

derekbruening self-assigned this Apr 14, 2017

derekbruening mentioned this issue Jul 16, 2019

i#2566 anon text: Use the app path instead of maps comments #3736

Merged

derekbruening mentioned this issue Jul 19, 2019

i#2350 rseq: Add initial handling of mainline kernel rseq #3740

Merged

derekbruening mentioned this issue Jul 19, 2019

i#2350 rseq: Remove old run-native code #3741

Merged

derekbruening mentioned this issue Jul 20, 2019

i#2350 rseq: Support cache pre-population #3745

Merged

derekbruening mentioned this issue Jul 25, 2019

i#2350 rseq: Fix __rseq_cs_ptr_array parsing crash #3755

Merged

derekbruening mentioned this issue Jul 26, 2019

i#2350 rseq: Translate from rseq region to handler #3757

Merged

derekbruening mentioned this issue Aug 12, 2019

i#2350 rseq: Add -disable_rseq option #3781

Merged

derekbruening mentioned this issue Sep 10, 2019

i#2350 rseq: Use a local copy for native execution #3826

Merged

derekbruening mentioned this issue Sep 16, 2019

Move label notes to another field to survive encoding with jump targets #3835

Closed

derekbruening mentioned this issue Jan 15, 2020

Rseq redirection fails to raise kernel xfer event #4019

Closed

derekbruening mentioned this issue Jan 17, 2020

i#731 re-rel: Convert native rseq PC targets to instrs #4023

Merged

derekbruening mentioned this issue Jan 17, 2020

i#2350 rseq: Handle translations in native rseq code #4026

Merged

derekbruening mentioned this issue Jan 17, 2020

i#2350 rseq: Update rseq limitation docs #4027

Merged

derekbruening mentioned this issue Jan 21, 2020

Add pad_jmps nop support for trace building #4038

Closed

derekbruening mentioned this issue Jan 22, 2020

i#4019: Raise event on rseq native abort #4042

Merged

This was referenced Jun 4, 2020

Add handling for restartable sequence corner cases #4315

Open

Add AArch64 support for restartable sequences #4316

Closed

derekbruening closed this as completed Jun 4, 2020

derekbruening mentioned this issue Apr 16, 2021

i#2350 rseq: Remove -rseq_assume_call and code #4859

Merged

derekbruening added a commit that referenced this issue Jun 8, 2021

i#2350 rseq: Clarify docs on rseq limitations

8eed3ea

Clarifies when instrumentation will see an rseq abort. Fixes a typo. Issue: #2350

derekbruening mentioned this issue Jun 8, 2021

i#2350 rseq: Clarify docs on rseq limitations #4942

Merged

derekbruening added a commit that referenced this issue Jun 8, 2021

i#2350 rseq: Clarify docs on rseq limitations (#4942)

2e61908

Clarifies when instrumentation will see an rseq abort. Fixes a typo. Issue: #2350

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support restartable regions for per-cpu critical regions #2350

support restartable regions for per-cpu critical regions #2350

derekbruening commented Apr 14, 2017

derekbruening commented Jun 18, 2018

derekbruening commented Jun 18, 2018

derekbruening commented Jun 18, 2018

Carrotman42 commented Feb 7, 2019

compudj commented Jun 9, 2019

derekbruening commented Sep 17, 2019

derekbruening commented Sep 17, 2019

derekbruening commented Jun 4, 2020

support restartable regions for per-cpu critical regions #2350

support restartable regions for per-cpu critical regions #2350

Comments

derekbruening commented Apr 14, 2017

derekbruening commented Jun 18, 2018

derekbruening commented Jun 18, 2018

derekbruening commented Jun 18, 2018

Carrotman42 commented Feb 7, 2019

compudj commented Jun 9, 2019

derekbruening commented Sep 17, 2019

derekbruening commented Sep 17, 2019

derekbruening commented Jun 4, 2020