Skip to content

Commit

Permalink
i#5953 rseq: Fix rseq adjust filter and instru abort issues (#5971)
Browse files Browse the repository at this point in the history
Fixes problems with the new rseq adjustment code with filtered traces:
+ Load and update the buffer pointer when emitting the rseq entry label
+ Do not try to adjust for i-filtered

Adds 2 new tests for filtering the linux.rseq app.

The new tests revealed a problem with the instr counts in the schedule
file: raw2trace was counting the PC-only entries as instructions; we fix that
here.

Reverts the rseq abort event and marker value back to the interrupted
PC for a signal in the instrumented execution.  Leaves all other cases
(all aborts in the native execution) as the handler.  This lets us
properly place the abort+signal for the instrumented interruption
case.
    
Adjusts the invariant checker abort check to use the new end PC for a
more accurate test to avoid false positives and negatives.

Fixes #5953, #4041
  • Loading branch information
derekbruening authored Apr 13, 2023
1 parent fb7d4c9 commit 27f7579
Show file tree
Hide file tree
Showing 15 changed files with 142 additions and 63 deletions.
8 changes: 5 additions & 3 deletions api/docs/release.dox
Original file line number Diff line number Diff line change
Expand Up @@ -156,10 +156,12 @@ changes:
AArch64 instruction and "reta" entries in the AArch64 codec were being used to
decode "retaa" and "retab". These instructions will now encode and decode correctly
as "retaa" and "retab".
- Changed the interrupted PC for #DR_XFER_RSEQ_ABORT and a signal generated during
an rseq region to be the abort handler rather than the committing store,
matching the kernel behavior.
- Added a #DR_XFER_RSEQ_ABORT event for a signal generated during an rseq region.
- Changed the interrupted PC for #DR_XFER_RSEQ_ABORT for native execution aborts to be
the abort handler (a signal during the instrumented execution will continue to have
the actual interrupted PC); changed the interrupted PC for #DR_XFER_SIGNAL_DELIVERY
for a signal generated during an rseq region to be the abort handler, matching the
kernel behavior.

Further non-compatibility-affecting changes include:
- Added AArchXX support for attaching to a running process.
Expand Down
4 changes: 3 additions & 1 deletion clients/drcachesim/common/trace_entry.h
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,9 @@ typedef enum {
/**
* Serves to further identify #TRACE_MARKER_TYPE_KERNEL_EVENT as a
* restartable sequence abort handler. This will always be immediately followed
* by #TRACE_MARKER_TYPE_KERNEL_EVENT. The marker value holds the continuation
* by #TRACE_MARKER_TYPE_KERNEL_EVENT. The marker value for a signal that
* interrupted the instrumented execution is the precise interrupted PC, but
* for all other cases the value holds the continuation
* program counter, which is the restartable sequence abort handler. (The precise
* interrupted point inside the sequence is not provided by the kernel.)
*/
Expand Down
29 changes: 23 additions & 6 deletions clients/drcachesim/tests/invariant_checker_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -349,25 +349,42 @@ check_rseq()
// Roll back rseq final instr.
{
std::vector<memref_t> memrefs = {
gen_marker(1, TRACE_MARKER_TYPE_RSEQ_ENTRY, 3),
gen_instr(1, 1),
gen_instr(1, 2),
gen_marker(1, TRACE_MARKER_TYPE_RSEQ_ABORT, 1),
gen_marker(1, TRACE_MARKER_TYPE_KERNEL_EVENT, 1),
gen_instr(1, 1),
// Rolled back instr at pc=2 size=1.
// Point to the abort handler.
gen_marker(1, TRACE_MARKER_TYPE_RSEQ_ABORT, 4),
gen_marker(1, TRACE_MARKER_TYPE_KERNEL_EVENT, 4),
gen_instr(1, 4),
};
if (!run_checker(memrefs, false))
return false;
}
// Fail to roll back rseq final instr.
{
std::vector<memref_t> memrefs = {
gen_marker(1, TRACE_MARKER_TYPE_RSEQ_ENTRY, 3),
gen_instr(1, 1),
gen_instr(1, 2),
// A fault in the instrumented execution.
gen_marker(1, TRACE_MARKER_TYPE_RSEQ_ABORT, 2),
gen_marker(1, TRACE_MARKER_TYPE_KERNEL_EVENT, 2),
gen_marker(1, TRACE_MARKER_TYPE_KERNEL_EVENT, 4),
gen_instr(1, 4),
};
if (!run_checker(memrefs, false))
return false;
}
// Fail to roll back rseq final instr.
{
std::vector<memref_t> memrefs = {
gen_marker(1, TRACE_MARKER_TYPE_RSEQ_ENTRY, 3),
gen_instr(1, 1),
gen_instr(1, 2),
gen_marker(1, TRACE_MARKER_TYPE_RSEQ_ABORT, 4),
gen_marker(1, TRACE_MARKER_TYPE_KERNEL_EVENT, 4),
gen_instr(1, 4),
};
if (!run_checker(memrefs, true, 1, 3,
if (!run_checker(memrefs, true, 1, 4,
"Rseq post-abort instruction not rolled back",
"Failed to catch bad rseq abort"))
return false;
Expand Down
18 changes: 15 additions & 3 deletions clients/drcachesim/tools/invariant_checker.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -205,12 +205,21 @@ invariant_checker_t::parallel_shard_memref(void *shard_data, const memref_t &mem
memref.marker.marker_type == TRACE_MARKER_TYPE_KERNEL_EVENT,
"Rseq marker not immediately prior to kernel marker");
}
if (memref.marker.type == TRACE_TYPE_MARKER &&
memref.marker.marker_type == TRACE_MARKER_TYPE_RSEQ_ENTRY) {
shard->rseq_end_pc_ = memref.marker.marker_value;
}
if (memref.marker.type == TRACE_TYPE_MARKER &&
memref.marker.marker_type == TRACE_MARKER_TYPE_RSEQ_ABORT) {
// Check that the rseq final instruction was not executed: that raw2trace
// rolled it back.
// rolled it back, unless it was a fault in the instrumented execution in which
// case the marker value will point to it.
report_if_false(shard,
memref.marker.marker_value != shard->prev_instr_.instr.addr,
shard->rseq_end_pc_ == 0 ||
shard->prev_instr_.instr.addr +
shard->prev_instr_.instr.size !=
shard->rseq_end_pc_ ||
shard->prev_instr_.instr.addr == memref.marker.marker_value,
"Rseq post-abort instruction not rolled back");
}
#endif
Expand Down Expand Up @@ -409,7 +418,10 @@ invariant_checker_t::parallel_shard_memref(void *shard_data, const memref_t &mem
memref.instr.addr == shard->app_handler_pc_ ||
// Marker for rseq abort handler. Not as unique as a prefetch, but
// we need an instruction and not a data type.
memref.instr.type == TRACE_TYPE_INSTR_DIRECT_JUMP,
memref.instr.type == TRACE_TYPE_INSTR_DIRECT_JUMP ||
// Instruction-filtered can easily skip the return point.
TESTANY(OFFLINE_FILE_TYPE_FILTERED | OFFLINE_FILE_TYPE_IFILTERED,
shard->file_type_),
"Signal handler return point incorrect");
// We assume paired signal entry-exit (so no longjmp and no rseq
// inside signal handlers).
Expand Down
1 change: 1 addition & 0 deletions clients/drcachesim/tools/invariant_checker.h
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ class invariant_checker_t : public analysis_tool_t {
// We could move this to per-worker data and still not need a lock
// (we don't currently have per-worker data though so leaving it as per-shard).
std::unordered_map<addr_t, addr_t> branch_target_cache;
addr_t rseq_end_pc_ = 0;
};

// We provide this for subclasses to run these invariants with custom
Expand Down
4 changes: 2 additions & 2 deletions clients/drcachesim/tools/view.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -307,8 +307,8 @@ view_t::parallel_shard_memref(void *shard_data, const memref_t &memref)
}
break;
case TRACE_MARKER_TYPE_RSEQ_ABORT:
std::cerr << "<marker: rseq abort xfer to handler 0x" << std::hex
<< memref.marker.marker_value << std::dec << "\n";
std::cerr << "<marker: rseq abort from 0x" << std::hex
<< memref.marker.marker_value << std::dec << " to handler>\n";
break;
case TRACE_MARKER_TYPE_RSEQ_ENTRY:
std::cerr << "<marker: rseq entry with end at 0x" << std::hex
Expand Down
53 changes: 28 additions & 25 deletions clients/drcachesim/tracer/raw2trace.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -583,11 +583,14 @@ raw2trace_t::process_offline_entry(raw2trace_thread_data_t *tdata,
if (!err.empty())
return err;
} else if (in_entry->extended.valueB == TRACE_MARKER_TYPE_RSEQ_ENTRY) {
log(4, "--- Reached rseq entry (end=0x%zx): buffering all output ---\n",
marker_val);
tdata->rseq_ever_saw_entry_ = true;
tdata->rseq_buffering_enabled_ = true;
tdata->rseq_end_pc_ = marker_val;
if (tdata->rseq_want_rollback_) {
log(4,
"--- Reached rseq entry (end=0x%zx): buffering all output ---\n",
marker_val);
tdata->rseq_ever_saw_entry_ = true;
tdata->rseq_buffering_enabled_ = true;
tdata->rseq_end_pc_ = marker_val;
}
}
// If there is currently a delayed branch that has not been emitted yet,
// delay most markers since intra-block markers can cause issues with
Expand Down Expand Up @@ -727,6 +730,10 @@ raw2trace_t::process_header(raw2trace_thread_data_t *tdata)
thread_id_t tid = header.tid;
tdata->tid = tid;
tdata->cache_line_size = header.cache_line_size;
// We can't adjust filtered instructions, so we disable buffering.
if (!TESTANY(OFFLINE_FILE_TYPE_FILTERED | OFFLINE_FILE_TYPE_IFILTERED,
get_file_type(tdata)))
tdata->rseq_want_rollback_ = true;
process_id_t pid = header.pid;
DR_ASSERT(tid != INVALID_THREAD_ID);
DR_ASSERT(pid != (process_id_t)INVALID_PROCESS_ID);
Expand Down Expand Up @@ -1468,17 +1475,9 @@ raw2trace_t::handle_kernel_interrupt_and_markers(
std::string err = get_marker_value(tdata, &in_entry, &marker_val);
if (!err.empty())
return err;
if (tdata->rseq_ever_saw_entry_ &&
// An abort always ends a block.
if (in_entry->extended.valueB == TRACE_MARKER_TYPE_KERNEL_EVENT ||
in_entry->extended.valueB == TRACE_MARKER_TYPE_RSEQ_ABORT) {
// An abort always ends a block.
} else if (in_entry->extended.valueB == TRACE_MARKER_TYPE_KERNEL_EVENT ||
// Support legacy traces before we added TRACE_MARKER_TYPE_RSEQ_ENTRY.
// We identify them by not seeing an entry before an abort.
// We won't fix up rseq aborts with timestamps (i#5954) nor rseq
// side exits (i#5953) for such traces but we can at least fix
// up typical aborts.
(!tdata->rseq_ever_saw_entry_ &&
in_entry->extended.valueB == TRACE_MARKER_TYPE_RSEQ_ABORT)) {
// A signal/exception marker in the next entry could be at any point
// among non-memref instrs, or it could be after this bb.
// We check the stored PC.
Expand All @@ -1491,13 +1490,13 @@ raw2trace_t::handle_kernel_interrupt_and_markers(
if (marker_val == cur_offs)
at_interrupted_pc = true;
} else {
if (marker_val == cur_pc ||
// Rseq abort events now give us the handler PC as the interrupted PC.
// We insert the signal at the commit PC as we have no further info.
(in_entry->extended.valueB == TRACE_MARKER_TYPE_RSEQ_ABORT &&
tdata->rseq_buffering_enabled_ && cur_pc == tdata->rseq_commit_pc_))
if (marker_val == cur_pc)
at_interrupted_pc = true;
}
// Support legacy traces before we added TRACE_MARKER_TYPE_RSEQ_ENTRY. We
// identify them by not seeing an entry before an abort. We won't fix up
// rseq aborts with timestamps (i#5954) nor rseq side exits (i#5953) for
// such traces but we can at least fix up typical aborts.
if (!tdata->rseq_ever_saw_entry_ &&
in_entry->extended.valueB == TRACE_MARKER_TYPE_RSEQ_ABORT) {
// For the older version, we will not get here for Windows
Expand All @@ -1513,12 +1512,12 @@ raw2trace_t::handle_kernel_interrupt_and_markers(
if (marker_val == 0 || at_interrupted_pc || legacy_rseq_rollback) {
log(4, "Signal/exception interrupted the bb @ %p\n", cur_pc);
if (tdata->rseq_past_end_) {
addr_t rseq_handler_pc =
addr_t rseq_abort_pc =
in_entry->extended.valueB == TRACE_MARKER_TYPE_RSEQ_ABORT
? marker_val
: 0;
err = adjust_and_emit_rseq_buffer(tdata, static_cast<addr_t>(cur_pc),
rseq_handler_pc);
rseq_abort_pc);
if (!err.empty())
return err;
}
Expand Down Expand Up @@ -1831,8 +1830,10 @@ raw2trace_t::rollback_rseq_buffer(raw2trace_thread_data_t *tdata,

std::string
raw2trace_t::adjust_and_emit_rseq_buffer(raw2trace_thread_data_t *tdata, addr_t next_pc,
addr_t abort_handler_pc)
addr_t abort_pc)
{
if (!tdata->rseq_want_rollback_)
return "";
log(4, "--- Rseq region exited at %p ---\n", next_pc);
if (verbosity_ >= 4) {
log(4, "Rseq buffer contents:\n");
Expand All @@ -1845,7 +1846,7 @@ raw2trace_t::adjust_and_emit_rseq_buffer(raw2trace_thread_data_t *tdata, addr_t
// We need this in the outer scope for use by write() below.
byte encoding[MAX_ENCODING_LENGTH];

if ((abort_handler_pc != 0 && next_pc == abort_handler_pc) ||
if ((abort_pc != 0 && next_pc == abort_pc) ||
// Old traces have the commit PC instead of the handler in the abort and
// signal events (so not really the "next_pc" but the "exit_pc") and
// don't have abort events on a signal aborting an rseq region.
Expand Down Expand Up @@ -2463,7 +2464,9 @@ raw2trace_t::write(raw2trace_thread_data_t *tdata, const trace_entry_t *start,
start = it;
DEBUG_ASSERT(tdata->cur_chunk_instr_count == 0);
}
if (type_is_instr(static_cast<trace_type_t>(it->type))) {
if (type_is_instr(static_cast<trace_type_t>(it->type)) &&
// Do not count PC-only i-filtered instrs.
it->size > 0) {
++tdata->cur_chunk_instr_count;
++instr_ordinal;
if (TESTANY(OFFLINE_FILE_TYPE_ENCODINGS, tdata->file_type) &&
Expand Down
3 changes: 2 additions & 1 deletion clients/drcachesim/tracer/raw2trace.h
Original file line number Diff line number Diff line change
Expand Up @@ -943,6 +943,7 @@ class raw2trace_t {
std::unordered_map<uint64_t, std::vector<schedule_entry_t>> cpu2sched;

// State for rolling back rseq aborts and side exits.
bool rseq_want_rollback_ = false;
bool rseq_ever_saw_entry_ = false;
bool rseq_buffering_enabled_ = false;
bool rseq_past_end_ = false;
Expand Down Expand Up @@ -1158,7 +1159,7 @@ class raw2trace_t {
// a side exit or abort if necessary.
std::string
adjust_and_emit_rseq_buffer(raw2trace_thread_data_t *tdata, addr_t next_pc,
addr_t abort_handler_pc = 0);
addr_t abort_pc = 0);

// Removes entries from tdata->rseq_buffer_ between and including the instructions
// starting at or after remove_start_rough_idx and before or equal to
Expand Down
24 changes: 16 additions & 8 deletions clients/drcachesim/tracer/tracer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1055,12 +1055,12 @@ event_app_instruction(void *drcontext, void *tag, instrlist_t *bb, instr_t *inst
FATAL("Fatal error: failed to reserve aflags\n");
}

bool needs_instru = false;
bool instr_is_rseq_entry_label =
instr_is_label(instr) && instr_get_note(instr) == (void *)DR_NOTE_RSEQ_ENTRY;
if (instr_is_rseq_entry_label) {
needs_instru = true;
}
bool need_rseq_instru = instr_is_label(instr) &&
instr_get_note(instr) == (void *)DR_NOTE_RSEQ_ENTRY &&
// For filtered instructions we can't adjust rseq regions as we do not have
// the fill instruction sequence so we reduce overhead by not outputting the
// entry markers.
!op_L0I_filter.get_value();

// Use emulation-aware queries to accurately trace rep string and
// scatter-gather expansions. Getting this wrong can result in significantly
Expand All @@ -1073,7 +1073,7 @@ event_app_instruction(void *drcontext, void *tag, instrlist_t *bb, instr_t *inst
// Ensure we reach the code below for post-strex instru.
ud->strex == NULL &&
// Do not skip misc cases that need instrumentation.
!needs_instru &&
!need_rseq_instru &&
// Avoid dropping trailing bundled instrs or missing the block-final clean call.
!is_last_instr(drcontext, instr))
return flags;
Expand Down Expand Up @@ -1189,9 +1189,17 @@ event_app_instruction(void *drcontext, void *tag, instrlist_t *bb, instr_t *inst
ud->strex = NULL;
}

if (instr_is_rseq_entry_label) {
if (need_rseq_instru) {
DR_ASSERT(!op_L0I_filter.get_value()); // Excluded from need_rseq_instru.
if (op_L0D_filter.get_value())
insert_load_buf_ptr(drcontext, bb, where, reg_ptr);
adjust =
instru->instrument_rseq_entry(drcontext, bb, where, instr, reg_ptr, adjust);
if (op_L0D_filter.get_value()) {
// When filtering we can't combine buf_ptr adjustments.
insert_update_buf_ptr(drcontext, bb, where, reg_ptr, DR_PRED_NONE, adjust);
adjust = 0;
}
}

/* Instruction entry for instr fetch trace. This does double-duty by
Expand Down
4 changes: 2 additions & 2 deletions core/globals.h
Original file line number Diff line number Diff line change
Expand Up @@ -391,8 +391,8 @@ typedef struct _client_data_t {
bool is_translating;
#endif
#ifdef LINUX
/* i#4041: Pass rseq events in addition to signal events. */
bool last_xl8_in_rseq;
/* i#4041: Pass the real translation for signals in rseq sequences. */
app_pc last_special_xl8;
#endif

/* flags for asserts on linux and for getting param base right on windows */
Expand Down
4 changes: 3 additions & 1 deletion core/lib/dr_events.h
Original file line number Diff line number Diff line change
Expand Up @@ -998,7 +998,9 @@ typedef enum {
DR_XFER_SET_CONTEXT_THREAD, /**< NtSetContextThread system call. */
DR_XFER_CLIENT_REDIRECT, /**< dr_redirect_execution() or #DR_SIGNAL_REDIRECT. */
/**
* A Linux restartable sequence was aborted. The interrupted PC always points
* A Linux restartable sequence was aborted. The interrupted PC for a signal in
* the execution instrumentation points to the precise interrupted
* instruction; but for an abort in the native exeuction, the PC always points
* to the abort handler, rather than the precise instruction that was aborted.
* This aligns with kernel behavior: the interrupted PC is not saved anywhere.
*/
Expand Down
19 changes: 12 additions & 7 deletions core/translate.c
Original file line number Diff line number Diff line change
Expand Up @@ -734,22 +734,27 @@ translate_restore_special_cases(dcontext_t *dcontext, app_pc pc)
LOG(THREAD_GET, LOG_INTERP, 2,
"recreate_app: moving " PFX " inside rseq region to handler " PFX "\n", pc,
handler);
/* Remember whether this was in an rseq region. */
dcontext->client_data->last_xl8_in_rseq = true;
/* Remember the original for translate_last_direct_translation. */
dcontext->client_data->last_special_xl8 = pc;
return handler;
}
dcontext->client_data->last_xl8_in_rseq = false;
dcontext->client_data->last_special_xl8 = NULL;
#endif
return pc;
}

bool
translate_last_in_rseq(dcontext_t *dcontext)
app_pc
translate_last_direct_translation(dcontext_t *dcontext, app_pc pc)
{
#ifdef LINUX
return dcontext->client_data->last_xl8_in_rseq;
app_pc handler;
if (dcontext->client_data->last_special_xl8 != NULL &&
rseq_get_region_info(dcontext->client_data->last_special_xl8, NULL, NULL,
&handler, NULL, NULL) &&
pc == handler)
return dcontext->client_data->last_special_xl8;
#endif
return false;
return pc;
}

/* Returns a success code, but makes a best effort regardless.
Expand Down
7 changes: 5 additions & 2 deletions core/translate.h
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,10 @@ stress_test_recreate_state(dcontext_t *dcontext, fragment_t *f, instrlist_t *ili
bool
at_syscall_translation(dcontext_t *dcontext, app_pc pc);

bool
translate_last_in_rseq(dcontext_t *dcontext);
/* Returns the direct translation when given the "official" translation.
* Some special cases like rseq sequences obfuscate the interrupted PC: i#4041.
*/
app_pc
translate_last_direct_translation(dcontext_t *dcontext, app_pc pc);

#endif /* _TRANSLATE_H_ */
Loading

0 comments on commit 27f7579

Please sign in to comment.