Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#5724 chunk bounds: Insert encodings in new chunks #5941

Merged
merged 5 commits into from
Mar 31, 2023

Conversation

derekbruening
Copy link
Contributor

Adds logic to insert new encodings for instruction entries that were generated before it was known they'd appear after a chunk boundary. This has some inherent complexity due to the different modules owning record generation and chunk creation, along with the complexity of needing the decode PC for instruction records generated in the past.

Adds a unit test.

This finishes #5724, allowing us to remove it from the flaky list.

Fixes #5724

Adds logic to insert new encodings for instruction entries that were
generated before it was known they'd appear after a chunk boundary.
This has some inherent complexity due to the different modules owning
record generation and chunk creation, along with the complexity of
needing the decode PC for instruction records generated in the past.

Adds a unit test.

This finishes #5724, allowing us to remove it from the flaky list.

Fixes #5724
@derekbruening
Copy link
Contributor Author

x64 failure is windows-zlib known issue #5635

@derekbruening derekbruening merged commit f5320c3 into master Mar 31, 2023
@derekbruening derekbruening deleted the i5724-encodings branch March 31, 2023 22:49
derekbruening added a commit that referenced this pull request Apr 11, 2023
With our run-twice rseq solution, we first run an instrumented path
through the rseq region which goes into the recorded drmemtrace trace.
But the native rseq execution might take a side exit or abort.  We
already had code to try to roll back the committing store on an abort,
but it was fragile as a timestamp in between thwarted it.  We had no
code to make a side exit match and as a result we'd see PC
discontinuities.

Here, we solve both problems with the following approach:

First, we have DR add a new DR_NOTE_RSEQ_ENTRY label to the top of the
first block in an rseq region.  This label's data stores the end PC
and the abort handler PC (we do not currently use the latter in
drmemtrace: once we revert #4041 we may need to do so).

Next, the tracer turns that label into a new
TRACE_MARKER_TYPE_RSEQ_ENTRY marker with the end PC as the marker
value.

Finally, rawtrace uses the new label to fix up the trace.  When it
sees the entry marker, it starts buffering instead of writing out
records.  While buffering, it remembers the decode PC of each
insruction (for encoding entries), plus the PC + target of each
branch.  When it sees the rseq end PC, it sets a flag, and on the next
instruction it has the rseq exit PC.  If it sees an rseq abort marker
or a signal marker with the flag set, it uses the interrupted PC as
the rseq exit PC (this avoids having to wait for the other side of the
signal handler).

With this rseq exit PC, raw2trace takes the following actions
depending on the PC value:
+ End PC fall-through: completed; emit entire buffer.
+ Committing store PC: roll back final instr in buffer.
  (Once #4041 reverts this to the handler PC we'll use the handler PC here
  and will add a new marker to hold that data.)
+ Target of a branch in the buffer: roll back to that branch.
  If multiple: take most recent.
+ None of the above: there cannot be an indirect branch we observed so
  we assume this is an inverted cbr.  If there is a forward direct cbr
  which was taken and stayed in the region, we synthesize a direct
  jump in the gap it jumped over.  This is not perfect but is the
  best we can readily do.

We leave the old rollback method (which didn't handle a timestamp in
between) in place but it is only enabled if we see an rseq abort
marker without the new rseq entry marker first.  We decided to not
bump the trace version nor add a new filetype to indicate the presence
of rseq entry markers as we can support legacy traces this way (at
least as well as they were supported before).

We give up on identifying which of multiple indirect rseq side exits were taken.
We give up on distinguishing a side exit whose target is the end PC fall-through from a
completion.

For emitting the accumulated rseq buffer, the delayed branch code
added for #5941 will handle chunk boundaries.

A side improvements along the way: we add a branch target pc to
instr_summary_t and avoided increasing its space by removing the
previously-stored next_pc and computing it from the length.

Adds a number of unit tests for rseq rollback and side exits, along
with a more-real instance in the linux.rseq test.  This was also
tested on a large proprietary application with multiple real instances
of rseq aborts and side exits.

Issue: #5953, #5953, #4041
Fixes #5953
Fixes #5954
derekbruening added a commit that referenced this pull request Apr 12, 2023
With our run-twice rseq solution, we first run an instrumented path
through the rseq region which goes into the recorded drmemtrace trace.
But the native rseq execution might take a side exit or abort. We
already had code to try to roll back the committing store on an abort,
but it was fragile as a timestamp in between thwarted it. We had no code
to make a side exit match and as a result we'd see PC discontinuities.

Here, we solve both problems with the following approach:

First, we have DR add a new DR_NOTE_RSEQ_ENTRY label to the top of the
first block in an rseq region. This label's data stores the end PC and
the abort handler PC (we do not currently use the latter in drmemtrace:
once we revert #4041 we may need to do so).

Next, the tracer turns that label into a new
TRACE_MARKER_TYPE_RSEQ_ENTRY marker with the end PC as the marker value.

Finally, rawtrace uses the new label to fix up the trace. When it sees
the entry marker, it starts buffering instead of writing out records.
While buffering, it remembers the decode PC of each insruction (for
encoding entries), plus the PC + target of each branch. When it sees the
rseq end PC, it sets a flag, and on the next instruction it has the rseq
exit PC. If it sees an rseq abort marker or a signal marker with the
flag set, it uses the interrupted PC as the rseq exit PC (this avoids
having to wait for the other side of the signal handler).

With this rseq exit PC, raw2trace takes the following actions depending
on the PC value:
+ End PC fall-through: completed; emit entire buffer.
+ Committing store PC: roll back final instr in buffer. (Once #4041
reverts this to the handler PC we'll use the handler PC here and will
add a new marker to hold that data.)
+ Target of a branch in the buffer: roll back to that branch. If
multiple: take most recent.
+ None of the above: there cannot be an indirect branch we observed so
we assume this is an inverted cbr. If there is a forward direct cbr
which was taken and stayed in the region, we synthesize a direct jump in
the gap it jumped over. This is not perfect but is the best we can
readily do.

We leave the old rollback method (which didn't handle a timestamp in
between) in place but it is only enabled if we see an rseq abort marker
without the new rseq entry marker first. We decided to not bump the
trace version nor add a new filetype to indicate the presence of rseq
entry markers as we can support legacy traces this way (at least as well
as they were supported before).

We give up on identifying which of multiple indirect rseq side exits
were taken. We give up on distinguishing a side exit whose target is the
end PC fall-through from a completion.

For emitting the accumulated rseq buffer, the delayed branch code added
for #5941 will handle chunk boundaries.

A side improvements along the way: we add a branch target pc to
instr_summary_t and avoided increasing its space by removing the
previously-stored next_pc and computing it from the length.

Adds a number of unit tests for rseq rollback and side exits, along with
a more-real instance in the linux.rseq test. This was also tested on a
large proprietary application with multiple real instances of rseq
aborts and side exits.

Issue: #5953, #5953, #4041
Fixes #5953
Fixes #5954
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

invariant_checker failure on a64: "Chunk instruction counts are inconsistent"
2 participants