Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pad_jmps nop support for trace building #4038

Closed
derekbruening opened this issue Jan 21, 2020 · 1 comment · Fixed by #4046
Closed

Add pad_jmps nop support for trace building #4038

derekbruening opened this issue Jan 21, 2020 · 1 comment · Fixed by #4046
Assignees

Comments

@derekbruening
Copy link
Contributor

This was PR 215179/Case 10744.

Today with -pad_jmps_shift_bb we avoid inserting nops before exit cti to keep cti offsets from crossing cache lines (and thus keeping them atomically patchable). This works great for regular bb's. However, with client changes to a bb, or significant internal mangling such as for rseq (#2350) or even just for UNIX inlined syscall fence exits, we can have extra exit cti's such that a single shift does not always work. We then insert nops in front of jumps, but we never added trace building support for these nops: we need to distinguish them from app code or client-inserted instructions in decode_fragment. So we mark as FRAG_CANNOT_BE_TRACE when we add a nop.

This cannot-be-trace is problematic: we can have a bb with no nops whose temp private copy for trace building needs nops (just due to fcache start alignment) and end up not making forward progress building a trace.

There are comments about this in the code labeled "PR 215179". I'm filing this in the current tracker to ensure we can more easily track and address this issue.

@derekbruening
Copy link
Contributor Author

One old idea was to use a linkstub_t flag to mark which exits have padding nops.

@derekbruening derekbruening self-assigned this Jan 21, 2020
derekbruening added a commit that referenced this issue Jan 21, 2020
Fixes an assert when we start building a trace but then have to abort
due to the private copy failing.

There is still a long-term problem here with the private copy failure
potentially happening over and over: that is covered by #4038.

Issue: #2989, #4038

Fixes #2989
derekbruening added a commit that referenced this issue Jan 22, 2020
Fixes an assert when we start building a trace but then have to abort
due to the private copy failing.

There is still a long-term problem here with the private copy failure
potentially happening over and over: that is covered by #4038.

Issue: #2989, #4038

Fixes #2989
derekbruening added a commit that referenced this issue Jan 22, 2020
Fixes an assert when we start building a trace but then have to abort
due to the private copy failing.

There is still a long-term problem here with the private copy failure
potentially happening over and over: that is covered by #4038.

Issue: #2989, #4038

Fixes #2989
derekbruening added a commit that referenced this issue Jan 22, 2020
Removes the dr_add_prefixes_to_basic_blocks() +
instr_branch_set_prefix_target() feature and its flags
LINK_TARGET_PREFIX and INSTR_BRANCH_TARGETS_PREFIX.  These were
already under UNSUPPORTED_API and not supported.  Removing the feature
frees up a linkstub flag for #4038.
derekbruening added a commit that referenced this issue Jan 22, 2020
Removes the dr_add_prefixes_to_basic_blocks() +
instr_branch_set_prefix_target() feature and its flags
LINK_TARGET_PREFIX and INSTR_BRANCH_TARGETS_PREFIX.  These were
already under UNSUPPORTED_API and not supported.  Removing the feature
frees up a linkstub flag for #4038.

Issue: #4045, #4038
derekbruening added a commit that referenced this issue Jan 22, 2020
Adds new flags LINK_PADDED and INSTR_BRANCH_PADDED and uses them to
mark exit cti padding added for safe patching.  Reads the link flag in
decode_fragment() to remove the padding during trace building.  This
avoids prior trace barriers from extra basic block exits from inlined
syscall exits, rseq mangling, and client changes.

Removes the option and stats around -pad_jmps_mark_no_trace which is
no longer needed.

Tested on the linux.rseq test which now successfully traces through
the rseq mangling.

Fixes #4038
derekbruening added a commit that referenced this issue Jan 23, 2020
Adds new flags LINK_PADDED and INSTR_BRANCH_PADDED and uses them to
mark exit cti padding added for safe patching.  Reads the link flag in
decode_fragment() to remove the padding during trace building.  This
avoids prior trace barriers from extra basic block exits from inlined
syscall exits, rseq mangling, and client changes.

Removes the option and stats around -pad_jmps_mark_no_trace which is
no longer needed.

Tested on the linux.rseq test which now successfully traces through
the rseq mangling.

Fixes #4038
derekbruening added a commit that referenced this issue Jan 23, 2020
When a signal occurs inside an rseq region, DR now provides the actual
interruption PC to the kernel xfer event (but not the signal event, to
match kernel behavior).  This is required to accurately place the
signal in a drmemtrace offline trace.

What this looks like:
Before:
      0x00007fe36b1ea8f1  74 02                jz     $0x00007fe36b1ea8f5
      0x00007fe36b1ea8f3  0f 0b                ud2a
      0x00007fe36b1ea8f5  83 05 74 38 00 00 01 addl   $0x01, <rel> 0x00007fe36b1ee170
    <marker: kernel xfer to handler>
    <marker: timestamp 13224196441528495>
    <marker: tid 74400 on core 0>
      0x00007fe36b1ea85c  55                   push   %rbp
After:
      0x00007f15e71d78f1  74 02                jz     $0x00007f15e71d78f5
      0x00007f15e71d78f3  0f 0b                ud2a
    <marker: kernel xfer to handler>
    <marker: timestamp 13224196672834337>
    <marker: tid 78727 on core 1>
      0x00007f15e71d785c  55                   push   %rbp

Adds a trace_invariants check for a signal immediately after UD2A.  I
first went through and added module_mapper and decoding to
trace_invariants: but then we have to link in static DR (even if we do
our own raw decoding as module_mapper uses DR) and thus we can't run
the test anymore on AArch64 (#2007) or Mac.  Instead, I went with a
simpler solution of putting the annotations that signal_invariants.c
uses into rseq.c.

I had to fix 2 bugs to get the marker in the right place and to get
this invariants check to fire:

1) raw2trace was adding to cur_modoffs before memrefs and then having
  an extra check for where to put the the signal, which doesn't make
  sense.  It ended up putting the marker too early.  After fixing, the
  other tests still pass, so it is not clear why I had it that way.

2) op_offline was off inside trace_invariants, so the assert on a
  signal after ud2a was not firing.  I now have analyzer_multi setting
  op_offline up front as a synthetic option for post-processing usage.

I also added a direct jump as an annotation as the start of the abort
handler in rseq.c to avoid trace_invariants complaining about a signal not
returning to the interruption point.

I added a drcachesim offline regression test tracing the rseq test's
code.  This same test will be separately used for #4038.

Issue: #4038, #4041
Fixes #4041
derekbruening added a commit that referenced this issue Jan 23, 2020
When a signal occurs inside an rseq region, DR now provides the actual
interruption PC to the kernel xfer event (but not the signal event, to
match kernel behavior).  This is required to accurately place the
signal in a drmemtrace offline trace.

What this looks like:
Before:
      0x00007fe36b1ea8f1  74 02                jz     $0x00007fe36b1ea8f5
      0x00007fe36b1ea8f3  0f 0b                ud2a
      0x00007fe36b1ea8f5  83 05 74 38 00 00 01 addl   $0x01, <rel> 0x00007fe36b1ee170
    <marker: kernel xfer to handler>
    <marker: timestamp 13224196441528495>
    <marker: tid 74400 on core 0>
      0x00007fe36b1ea85c  55                   push   %rbp
After:
      0x00007f15e71d78f1  74 02                jz     $0x00007f15e71d78f5
      0x00007f15e71d78f3  0f 0b                ud2a
    <marker: kernel xfer to handler>
    <marker: timestamp 13224196672834337>
    <marker: tid 78727 on core 1>
      0x00007f15e71d785c  55                   push   %rbp

Adds a trace_invariants check for a signal immediately after UD2A.  I
first went through and added module_mapper and decoding to
trace_invariants: but then we have to link in static DR (even if we do
our own raw decoding as module_mapper uses DR) and thus we can't run
the test anymore on AArch64 (#2007) or Mac.  Instead, I went with a
simpler solution of putting the annotations that signal_invariants.c
uses into rseq.c.

I had to fix 2 bugs to get the marker in the right place and to get
this invariants check to fire:

1) raw2trace was adding to cur_modoffs before memrefs and then having
  an extra check for where to put the the signal, which doesn't make
  sense.  It ended up putting the marker too early.  After fixing, the
  other tests still pass, so it is not clear why I had it that way.

2) op_offline was off inside trace_invariants, so the assert on a
  signal after ud2a was not firing.  I now have analyzer_multi setting
  op_offline up front as a synthetic option for post-processing usage.

I also added a direct jump as an annotation as the start of the abort
handler in rseq.c to avoid trace_invariants complaining about a signal not
returning to the interruption point.

I added a drcachesim offline regression test tracing the rseq test's
code.  This same test will be separately used for #4038.

Issue: #4038, #4041
Fixes #4041
derekbruening added a commit that referenced this issue Jan 23, 2020
Makes the following changes to ensure the drcachesim rseq test from

+ Confirms that the gap is found by trace_invariants.  It wasn't in
  HEAD: here I fix a bug to only honor pref_xfer_marker if it was not
  cleared.

+ Adds ending of a bb right after an rseq committing store, to
  simplify the native-rseq-mangled bb and drmemtrace's handling:
  raw2trace wants to include the subsequent instructions in the bb;
  they are removed by #4038's rollback but it seems cleaner to just
  not have them there in the bb in the first place.

Issue: #4038, #4041
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant