Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#2350 rseq: Use a local copy for native execution #3826

Merged
merged 4 commits into from
Sep 12, 2019

Conversation

derekbruening
Copy link
Contributor

Eliminates the call-return reliance for the native execution step of
rseq support. Makes a local copy of the sequence right inside the
sequence-ending block and executes it. The sequence is inserted as
additional instructions and is then mangled normally (mangling changes
are assumed to be restartable), but it is not passed to clients. Any
exits are regular block exits, resulting in a block with many exits.

The prior call-return scheme is left under a temporary option
-rseq_assume_call, as a failsafe in case there are stability problems
discovered with this native execution implementation. Once we are
happy with the new scheme we can remove the option.

To make the local copy an rseq region, the per-thread rseq_cs address
is identified by watching system calls. For attach, it is identified
by searching the possible static TLS offsets. The assumption of a
constant offset is documented and verified.

The rseq_cs's abort handler is a new exit added with the app's
signature as data just before it, hidden in the operands of a nop
instruction to avoid problems with decoding the fragment. A local
jump skips over the data and exit.

A new rseq_cs structure is allocated for each sequence-ending
fragment. It is stored in a hashtable in the rseq module, to avoid
complexities and overhead of adding an additional fragment_t or
"subclass" field. A new flag is set to trigger calling into the rseq
module on fragment deletion.

The rseq_cs fields are filled in via a new post-emit control point,
using information stored in labels during mangling. The pointer to
the rseq_cs is inserted with a dummy value and patched in this new
control point using a new utility routine patch_mov_immed_ptrsz().

To avoid crashing due to invalid rseq bounds after freeing the rseq_cs
structure, the rseq pointer is cleared explicitly on completion, and
on midpoint exit by the fragment deletion hook along with a hook on
the shared fragment flushtime update, to ensure all threads are
covered.

The rseq test is augmented and expanded. An invalid instruction is
added to properly test the abort handler, under a conditional to allow
testing each sequence both to completion and on abort.

Future work is properly handling a midpoint exit during the
instrumentation execution: we need to invoke the native version as
well.

Adding aarchxx support is also future work: the
patch_mov_immed_ptrsz(), the writes to the rseq struct in TLS, and the
rseq tests are currently x86-only.

Issue: #2350

Eliminates the call-return reliance for the native execution step of
rseq support.  Makes a local copy of the sequence right inside the
sequence-ending block and executes it.  The sequence is inserted as
additional instructions and is then mangled normally (mangling changes
are assumed to be restartable), but it is not passed to clients.  Any
exits are regular block exits, resulting in a block with many exits.

The prior call-return scheme is left under a temporary option
-rseq_assume_call, as a failsafe in case there are stability problems
discovered with this native execution implementation.  Once we are
happy with the new scheme we can remove the option.

To make the local copy an rseq region, the per-thread rseq_cs address
is identified by watching system calls.  For attach, it is identified
by searching the possible static TLS offsets.  The assumption of a
constant offset is documented and verified.

The rseq_cs's abort handler is a new exit added with the app's
signature as data just before it, hidden in the operands of a nop
instruction to avoid problems with decoding the fragment.  A local
jump skips over the data and exit.

A new rseq_cs structure is allocated for each sequence-ending
fragment.  It is stored in a hashtable in the rseq module, to avoid
complexities and overhead of adding an additional fragment_t or
"subclass" field.  A new flag is set to trigger calling into the rseq
module on fragment deletion.

The rseq_cs fields are filled in via a new post-emit control point,
using information stored in labels during mangling.  The pointer to
the rseq_cs is inserted with a dummy value and patched in this new
control point using a new utility routine patch_mov_immed_ptrsz().

To avoid crashing due to invalid rseq bounds after freeing the rseq_cs
structure, the rseq pointer is cleared explicitly on completion, and
on midpoint exit by the fragment deletion hook along with a hook on
the shared fragment flushtime update, to ensure all threads are
covered.

The rseq test is augmented and expanded.  An invalid instruction is
added to properly test the abort handler, under a conditional to allow
testing each sequence both to completion and on abort.

Future work is properly handling a midpoint exit during the
instrumentation execution: we need to invoke the native version as
well.

Adding aarchxx support is also future work: the
patch_mov_immed_ptrsz(), the writes to the rseq struct in TLS, and the
rseq tests are currently x86-only.

Issue: #2350
@derekbruening
Copy link
Contributor Author

Unfortunately this is on the large side. Some small pieces could be split out, like the ATOMIC_1BYTE_WRITE, but to split out anything sizable would require committing some unexecuted code. Let me know if it's too much to review at once. The description should cover the key changes. I still wanted to update the wiki page with more details on all of the design decisions. Some of them we discussed offline so they should be familiar.

@Carrotman42
Copy link
Contributor

The CL size is not a big deal, thanks for the warning though :) I'll start taking a look

@derekbruening
Copy link
Contributor Author

Spent some time looking at the AArch64 failures: it seems that a bunch of tests which all load app libraries fail with the assert core/utils.c:685 null_ok. I don't seem to have access to the Jenkins machine, but I do have access to a Packet machine. I can reproduce the assert there. However, after failing to find a connection to this PR, I also tried running HEAD and there those app-lib tests also have problems: they time out, hanging at the same point as the assert in this run. Yet the previous build on Jenkins has these tests passing. Still investigating, but given the HEAD behavior suspecting something unrelated to this PR.

@derekbruening
Copy link
Contributor Author

The HEAD hang looks related to #1698: I'm seeing ldaxr involved.

@derekbruening
Copy link
Contributor Author

Figured out the cause of the AArch64 issues: fixed in 7906b7f. The hangs are still there on this other machine. I expect Jenkins to be green now.

api/docs/bt.dox Outdated Show resolved Hide resolved
core/arch/arch_exports.h Outdated Show resolved Hide resolved
core/arch/mangle_shared.c Show resolved Hide resolved
core/unix/rseq_linux.c Outdated Show resolved Hide resolved
core/arch/x86/mangle.c Outdated Show resolved Hide resolved
core/unix/rseq_linux.c Show resolved Hide resolved
suite/tests/linux/rseq.c Show resolved Hide resolved
@derekbruening
Copy link
Contributor Author

PTAL

@derekbruening derekbruening merged commit d465def into master Sep 12, 2019
@derekbruening derekbruening deleted the i2350-rseq-run-copy branch September 12, 2019 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants