-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#2350 rseq: Use a local copy for native execution #3826
Conversation
Eliminates the call-return reliance for the native execution step of rseq support. Makes a local copy of the sequence right inside the sequence-ending block and executes it. The sequence is inserted as additional instructions and is then mangled normally (mangling changes are assumed to be restartable), but it is not passed to clients. Any exits are regular block exits, resulting in a block with many exits. The prior call-return scheme is left under a temporary option -rseq_assume_call, as a failsafe in case there are stability problems discovered with this native execution implementation. Once we are happy with the new scheme we can remove the option. To make the local copy an rseq region, the per-thread rseq_cs address is identified by watching system calls. For attach, it is identified by searching the possible static TLS offsets. The assumption of a constant offset is documented and verified. The rseq_cs's abort handler is a new exit added with the app's signature as data just before it, hidden in the operands of a nop instruction to avoid problems with decoding the fragment. A local jump skips over the data and exit. A new rseq_cs structure is allocated for each sequence-ending fragment. It is stored in a hashtable in the rseq module, to avoid complexities and overhead of adding an additional fragment_t or "subclass" field. A new flag is set to trigger calling into the rseq module on fragment deletion. The rseq_cs fields are filled in via a new post-emit control point, using information stored in labels during mangling. The pointer to the rseq_cs is inserted with a dummy value and patched in this new control point using a new utility routine patch_mov_immed_ptrsz(). To avoid crashing due to invalid rseq bounds after freeing the rseq_cs structure, the rseq pointer is cleared explicitly on completion, and on midpoint exit by the fragment deletion hook along with a hook on the shared fragment flushtime update, to ensure all threads are covered. The rseq test is augmented and expanded. An invalid instruction is added to properly test the abort handler, under a conditional to allow testing each sequence both to completion and on abort. Future work is properly handling a midpoint exit during the instrumentation execution: we need to invoke the native version as well. Adding aarchxx support is also future work: the patch_mov_immed_ptrsz(), the writes to the rseq struct in TLS, and the rseq tests are currently x86-only. Issue: #2350
Unfortunately this is on the large side. Some small pieces could be split out, like the ATOMIC_1BYTE_WRITE, but to split out anything sizable would require committing some unexecuted code. Let me know if it's too much to review at once. The description should cover the key changes. I still wanted to update the wiki page with more details on all of the design decisions. Some of them we discussed offline so they should be familiar. |
The CL size is not a big deal, thanks for the warning though :) I'll start taking a look |
Spent some time looking at the AArch64 failures: it seems that a bunch of tests which all load app libraries fail with the assert |
The HEAD hang looks related to #1698: I'm seeing |
Figured out the cause of the AArch64 issues: fixed in 7906b7f. The hangs are still there on this other machine. I expect Jenkins to be green now. |
PTAL |
Eliminates the call-return reliance for the native execution step of
rseq support. Makes a local copy of the sequence right inside the
sequence-ending block and executes it. The sequence is inserted as
additional instructions and is then mangled normally (mangling changes
are assumed to be restartable), but it is not passed to clients. Any
exits are regular block exits, resulting in a block with many exits.
The prior call-return scheme is left under a temporary option
-rseq_assume_call, as a failsafe in case there are stability problems
discovered with this native execution implementation. Once we are
happy with the new scheme we can remove the option.
To make the local copy an rseq region, the per-thread rseq_cs address
is identified by watching system calls. For attach, it is identified
by searching the possible static TLS offsets. The assumption of a
constant offset is documented and verified.
The rseq_cs's abort handler is a new exit added with the app's
signature as data just before it, hidden in the operands of a nop
instruction to avoid problems with decoding the fragment. A local
jump skips over the data and exit.
A new rseq_cs structure is allocated for each sequence-ending
fragment. It is stored in a hashtable in the rseq module, to avoid
complexities and overhead of adding an additional fragment_t or
"subclass" field. A new flag is set to trigger calling into the rseq
module on fragment deletion.
The rseq_cs fields are filled in via a new post-emit control point,
using information stored in labels during mangling. The pointer to
the rseq_cs is inserted with a dummy value and patched in this new
control point using a new utility routine patch_mov_immed_ptrsz().
To avoid crashing due to invalid rseq bounds after freeing the rseq_cs
structure, the rseq pointer is cleared explicitly on completion, and
on midpoint exit by the fragment deletion hook along with a hook on
the shared fragment flushtime update, to ensure all threads are
covered.
The rseq test is augmented and expanded. An invalid instruction is
added to properly test the abort handler, under a conditional to allow
testing each sequence both to completion and on abort.
Future work is properly handling a midpoint exit during the
instrumentation execution: we need to invoke the native version as
well.
Adding aarchxx support is also future work: the
patch_mov_immed_ptrsz(), the writes to the rseq struct in TLS, and the
rseq tests are currently x86-only.
Issue: #2350