Skip to content

Commit

Permalink
i#4274 kernel pc: Record interrupted PC before drmemtrace kernel events
Browse files Browse the repository at this point in the history
For online traces, adds the interrupted PC to kernel event entries.

For offline traces, updates the existing module offset stored inside
kernel events (and previously used only for raw2trace) to become an
absolute PC to help in core simulators and other trace consumers who
want to know branch targets prior to kernel events.

This is a version change for offline traces, and the version is
updated, with a named constant for the old version.

The raw offline's module offset is upgraded to become an index;offset
pair, as that format is better suited for raw2trace and it avoids
needing an extra entry nearly all of the time.  The raw2trace
postprocessing converts it to an absolute PC for the final trace.
Since the 49 bits can take two raw entries, raw2trace is augmented to
handle unreading such a double entry.

Adds a new TRACE_MARKER_TYPE_RSEQ_ABORT marker to further identify an
rseq abort, in order to roll back the committing store in raw2trace
(previously it used the lack of interrupted-PC to identify an rseq
abort).

Adds support to the view tool for displaying the interrupted PC
depending on the version.

Updates the drcachesim documentation with the new output of the
interrupted PC from the view tool.

Fixes an issue in the reader where the first switch to a thread has
the prior thread's identity in the two leading marker entries for
version and filetype.

Adds tests of the interrupted PC to the trace_invariants test for
signals, as well as ensuring that raw2trace has rolled back an rseq
abort final instruction so that the abort appears to occur at a
legitimate place inside the region.

Adds a test of the legacy version with just offsets by checking in raw
files from a signal_invariants run, with fake libdrmemtrace.so and
lidynamorio.so to keep the size down.

Fixes #4274
  • Loading branch information
derekbruening committed Jul 2, 2021
1 parent 9970fd0 commit a3d1db7
Show file tree
Hide file tree
Showing 23 changed files with 403 additions and 126 deletions.
3 changes: 3 additions & 0 deletions api/docs/release.dox
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,9 @@ Further non-compatibility-affecting changes include:
- Added the source context for restartable sequence aborts (#DR_XFER_RSEQ_ABORT)
which was previously missing.
- Added a #TRACE_MARKER_TYPE_VERSION entry to drmemtrace traces.
- Augmented drmemtrace #TRACE_MARKER_TYPE_KERNEL_EVENT entries with the absolute
PC of the interruption point, including for restartable sequence aborts, which
now also have an additional #TRACE_MARKER_TYPE_RSEQ_ABORT identifier.

**************************************************
<hr>
Expand Down
47 changes: 41 additions & 6 deletions clients/drcachesim/common/trace_entry.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,14 @@ typedef uintptr_t addr_t; /**< The type of a memory address. */
* #TRACE_MARKER_TYPE_VERSION.
*/
typedef enum {
TRACE_ENTRY_VERSION = 2 /**< The latest version of the trace format. */
/**
* A prior version where #TRACE_MARKER_TYPE_KERNEL_EVENT provided the module
* offset (and nothing for restartable sequence aborts) rather than the absolute
* PC of the interruption point provided today.
*/
TRACE_ENTRY_VERSION_NO_KERNEL_PC = 2,
/** The latest version of the trace format. */
TRACE_ENTRY_VERSION,
} trace_version_t;

/** The type of a trace entry in a #memref_t structure. */
Expand Down Expand Up @@ -208,10 +215,21 @@ typedef enum {
* The subsequent instruction is the start of a handler for a kernel-initiated
* event: a signal handler or restartable sequence abort handler on UNIX, or an
* APC, exception, or callback dispatcher on Windows.
* The value holds the module offset of the interruption point PC,
* which is used in post-processing. The value is 0 for some types, namely
* Windows callbacks and Linux rseq aborts, but these can be assumed to target
* the start of a block and so there is no loss of accuracy when post-processing.
* The value of this marker contains the program counter at the kernel
* interruption point. If the interruption point is just after a branch, this
* value is the target of that branch.
* (For trace version #TRACE_ENTRY_VERSION_NO_KERNEL_PC or below, the value is
* the module offset rather than the absolute program counter.)
* The value is 0 for some types where this information is not available, namely
* Windows callbacks.
* A restartable sequence abort handler is further identified by a prior
* marker of type #TRACE_MARKER_TYPE_RSEQ_ABORT.
*/
/* Non-exported information since limited to raw offline traces:
* For raw offline traces, the value is in the form of the module index and offset
* (from the base, not the indexed segment) of type kernel_interrupted_raw_pc_t.
* For raw offline traces, a value of 0 can be assumed to target the start of a
* block and so there is no loss of accuracy when post-processing.
*/
TRACE_MARKER_TYPE_KERNEL_EVENT,
/**
Expand Down Expand Up @@ -301,6 +319,12 @@ typedef enum {
*/
TRACE_MARKER_TYPE_VERSION,

/**
* Serves to further identify #TRACE_MARKER_TYPE_KERNEL_EVENT as a
* restartable sequence abort handler.
*/
TRACE_MARKER_TYPE_RSEQ_ABORT,

// ...
// These values are reserved for future built-in marker types.
// ...
Expand Down Expand Up @@ -420,7 +444,8 @@ typedef enum {
#define OFFLINE_FILE_VERSION_NO_ELISION 2
#define OFFLINE_FILE_VERSION_OLDEST_SUPPORTED OFFLINE_FILE_VERSION_NO_ELISION
#define OFFLINE_FILE_VERSION_ELIDE_UNMOD_BASE 3
#define OFFLINE_FILE_VERSION OFFLINE_FILE_VERSION_ELIDE_UNMOD_BASE
#define OFFLINE_FILE_VERSION_KERNEL_INT_PC 4
#define OFFLINE_FILE_VERSION OFFLINE_FILE_VERSION_KERNEL_INT_PC

/**
* Bitfields used to describe the high-level characteristics of both an
Expand Down Expand Up @@ -513,6 +538,16 @@ struct _offline_entry_t {
} END_PACKED_STRUCTURE;
typedef struct _offline_entry_t offline_entry_t;

// This is the raw marker value for TRACE_MARKER_TYPE_KERNEL_*.
// It occupies 49 bits and so may require two raw entries.
typedef union {
struct {
uint64_t modoffs : PC_MODOFFS_BITS;
uint64_t modidx : PC_MODIDX_BITS;
} pc;
uint64_t combined_value;
} kernel_interrupted_raw_pc_t;

/**
* The name of the file in -offline mode where module data is written.
* Its creation can be customized using drmemtrace_custom_module_data()
Expand Down
99 changes: 51 additions & 48 deletions clients/drcachesim/drcachesim.dox.in
Original file line number Diff line number Diff line change
Expand Up @@ -598,49 +598,48 @@ T342626 write 8 byte(s) @ 0x7f899f928e70
\endcode


Here is an example of a signal handler interrupting the regular flow:
Here is an example of a signal handler interrupting the regular flow,
with metadata showing that the signal was delivered just after a
non-taken conditional branch:

\code
T342625 0x0000000000469957 49 8b 44 17 18 mov 0x18(%r15,%rdx), %rax
T342625 read 8 byte(s) @ 0x4e7848
T342625 0x000000000046995c 48 85 c0 test %rax, %rax
T342625 0x000000000046995f 0f 84 a5 00 00 00 jz $0x0000000000469a0a
T342625 <marker: kernel xfer to handler>
T342625 <marker: timestamp 13260900248211321>
T342625 <marker: tid 342625 on core 3>
T342625 0x000000000040257d 55 push %rbp
T342625 write 8 byte(s) @ 0x7ffe70dcda30
T342625 0x000000000040257e 48 89 e5 mov %rsp, %rbp
T342625 0x0000000000402581 89 7d fc mov %edi, -0x04(%rbp)
T342625 write 4 byte(s) @ 0x7ffe70dcda2c
T342625 0x0000000000402584 48 89 75 f0 mov %rsi, -0x10(%rbp)
T342625 write 8 byte(s) @ 0x7ffe70dcda20
T342625 0x0000000000402588 48 89 55 e8 mov %rdx, -0x18(%rbp)
T342625 write 8 byte(s) @ 0x7ffe70dcda18
T342625 0x000000000040258c 83 7d fc 1a cmp -0x04(%rbp), $0x1a
T342625 read 4 byte(s) @ 0x7ffe70dcda2c
T342625 0x0000000000402590 75 0f jnz $0x00000000004025a1
T342625 0x0000000000402592 8b 05 5c 0f 0e 00 mov <rel> 0x00000000004e34f4, %eax
T342625 read 4 byte(s) @ 0x4e34f4
T342625 0x0000000000402598 83 c0 01 add $0x01, %eax
T342625 0x000000000040259b 89 05 53 0f 0e 00 mov %eax, <rel> 0x00000000004e34f4
T342625 write 4 byte(s) @ 0x4e34f4
T342625 0x00000000004025a1 90 nop
T342625 0x00000000004025a2 5d pop %rbp
T342625 read 8 byte(s) @ 0x7ffe70dcda30
T342625 0x00000000004025a3 c3 ret
T342625 read 8 byte(s) @ 0x7ffe70dcda38
T342625 0x0000000000407bb0 48 c7 c0 0f 00 00 00 mov $0x0000000f, %rax
T342625 0x0000000000407bb7 0f 05 syscall
T342625 <marker: timestamp 13260900248211330>
T342625 <marker: tid 342625 on core 3>
T342625 <marker: syscall xfer>
T342625 <marker: timestamp 13260900248214640>
T342625 <marker: tid 342625 on core 3>
T342625 0x0000000000469965 49 8b 54 17 10 mov 0x10(%r15,%rdx), %rdx
T342625 read 8 byte(s) @ 0x4e7840
T342625 0x000000000046996a 48 3b 15 9f fa 07 00 cmp %rdx, <rel> 0x00000000004e9410
T342625 read 8 byte(s) @ 0x4e9410
T585061 0x00007fdb4e95128f 41 f6 44 24 08 08 test 0x08(%r12), $0x08
T585061 read 1 byte(s) @ 0x7ffd5af76b08
T585061 0x00007fdb4e951295 0f 85 28 04 00 00 jnz $0x00007fdb4e9516c3
T585061 <marker: kernel xfer from 0x7fdb4e95129b to handler>
T585061 <marker: timestamp 13269730052517230>
T585061 <marker: tid 585061 on core 3>
T585061 0x00007fdb4ace9dba 55 push %rbp
T585061 write 8 byte(s) @ 0x7ffd5af763d0
T585061 0x00007fdb4ace9dbb 48 89 e5 mov %rsp, %rbp
T585061 0x00007fdb4ace9dbe 89 7d fc mov %edi, -0x04(%rbp)
T585061 write 4 byte(s) @ 0x7ffd5af763cc
T585061 0x00007fdb4ace9dc1 48 89 75 f0 mov %rsi, -0x10(%rbp)
T585061 write 8 byte(s) @ 0x7ffd5af763c0
T585061 0x00007fdb4ace9dc5 48 89 55 e8 mov %rdx, -0x18(%rbp)
T585061 write 8 byte(s) @ 0x7ffd5af763b8
T585061 0x00007fdb4ace9dc9 83 7d fc 1a cmp -0x04(%rbp), $0x1a
T585061 read 4 byte(s) @ 0x7ffd5af763cc
T585061 0x00007fdb4ace9dcd 75 0f jnz $0x00007fdb4ace9dde
T585061 0x00007fdb4ace9dcf 8b 05 7f 23 20 00 mov <rel> 0x00007fdb4aeec154, %eax
T585061 read 4 byte(s) @ 0x7fdb4aeec154
T585061 0x00007fdb4ace9dd5 83 c0 01 add $0x01, %eax
T585061 0x00007fdb4ace9dd8 89 05 76 23 20 00 mov %eax, <rel> 0x00007fdb4aeec154
T585061 write 4 byte(s) @ 0x7fdb4aeec154
T585061 0x00007fdb4ace9dde 90 nop
T585061 0x00007fdb4ace9ddf 5d pop %rbp
T585061 read 8 byte(s) @ 0x7ffd5af763d0
T585061 0x00007fdb4ace9de0 c3 ret
T585061 read 8 byte(s) @ 0x7ffd5af763d8
T585061 0x00007fdb4e95c140 48 c7 c0 0f 00 00 00 mov $0x0000000f, %rax
T585061 0x00007fdb4e95c147 0f 05 syscall
T585061 <marker: timestamp 13269730052517239>
T585061 <marker: tid 585061 on core 3>
T585061 <marker: syscall xfer from 0x7fdb4e95c149>
T585061 <marker: timestamp 13269730052520271>
T585061 <marker: tid 585061 on core 3>
T585061 0x00007fdb4e95129b 48 8b 1d 8e 40 01 00 mov <rel> 0x00007fdb4e965330, %rbx
T585061 read 8 byte(s) @ 0x7fdb4e965330
\endcode

\section sec_tool_func_view View Function Calls
Expand Down Expand Up @@ -1050,20 +1049,21 @@ instruction information to go along with each load and store, while cache
simulators can ignore these "no-fetch" entries and avoid incorrectly
inflating instruction fetch statistics.

Offline traces guarantee that a branch target instruction entry in a
trace must immediately follow the branch instruction with no intervening
thread switch. This allows a core simulator to identify the target of a
branch by looking at the subsequent trace entry. However, this guarantee
does not hold when a kernel event such as a signal is delivered
immediately after a branch.

Traces include scheduling markers providing the timestamp and hardware
thread identifier on each thread transition, allowing a simulator to more
closely match the actual hardware if so desired.

Traces also include markers indicating disruptions in user mode control
flow such as signal handler entry and exit.

Offline traces guarantee that a branch target instruction entry in a
trace must immediately follow the branch instruction with no intervening
thread switch. This allows a core simulator to identify the target of a
branch by looking at the subsequent trace entry. This guarantee
does not hold when a kernel event such as a signal is delivered
immediately after a branch; however, each marker indicating such a kernel transfer
includes the interrupted PC, explicitly providing the branch target.

Filtered traces (filtered via -L0_filter) include the dynamic (pre-filtered)
per-thread instruction count in a #TRACE_MARKER_TYPE_INSTRUCTION_COUNT marker at
each thread buffer boundary and at thread exit.
Expand Down Expand Up @@ -1248,6 +1248,9 @@ these areas of missing functionality:
- Online traces may skip instructions immediately prior to
non-load-or-store-related kernel transfer events
(https://github.com/DynamoRIO/dynamorio/issues/3937).
- Online traces may include the committing store in the trace when a
restartable sequence abort happened prior to that store
(https://github.com/DynamoRIO/dynamorio/issues/4041).
- Application phase marking is not yet implemented
(https://github.com/DynamoRIO/dynamorio/issues/2478).

Expand Down
5 changes: 3 additions & 2 deletions clients/drcachesim/reader/file_reader.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2016-2020 Google, Inc. All rights reserved.
* Copyright (c) 2016-2021 Google, Inc. All rights reserved.
* **********************************************************/

/*
Expand Down Expand Up @@ -168,7 +168,8 @@ template <typename T> class file_reader_t : public reader_t {
return false;
}
// We can handle the older version 1 as well which simply omits the
// early marker with the arch tag.
// early marker with the arch tag, and version 2 which only differs wrt
// TRACE_MARKER_TYPE_KERNEL_EVENT..
if (header.addr > TRACE_ENTRY_VERSION) {
ERRMSG(
"Cannot handle version #%zu (expect version <= #%u) for input file "
Expand Down
12 changes: 9 additions & 3 deletions clients/drcachesim/reader/reader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -204,9 +204,15 @@ reader_t::operator++()
case TRACE_TYPE_MARKER:
have_memref = true;
cur_ref_.marker.type = (trace_type_t)input_entry_->type;
assert((cur_tid_ != 0 && cur_pid_ != 0) ||
input_entry_->size == TRACE_MARKER_TYPE_VERSION ||
input_entry_->size == TRACE_MARKER_TYPE_FILETYPE);
if (input_entry_->size == TRACE_MARKER_TYPE_VERSION ||
input_entry_->size == TRACE_MARKER_TYPE_FILETYPE) {
// Do not carry over a prior thread on a thread switch to a
// first-time-seen new thread, whose tid entry is *after* these
// 2 markers.
cur_pid_ = 0;
cur_tid_ = 0;
} else
assert(cur_tid_ != 0 && cur_pid_ != 0);
cur_ref_.marker.pid = cur_pid_;
cur_ref_.marker.tid = cur_tid_;
cur_ref_.marker.marker_type = (trace_marker_type_t)input_entry_->size;
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Empty file.
Binary file not shown.
66 changes: 66 additions & 0 deletions clients/drcachesim/tests/offline-legacy-int-offs.templatex
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
Basic counts tool results:
Total counts:
109205 total \(fetched\) instructions
6768 total unique \(fetched\) instructions
93 total non-fetched instructions
11 total prefetches
23974 total data loads
5544 total data stores
0 total icache flushes
0 total dcache flushes
3 total threads
136 total scheduling markers
11 total transfer markers
0 total function id markers
0 total function return address markers
0 total function argument markers
0 total function return value markers
9 total other markers
Thread 552306 counts:
101049 \(fetched\) instructions
6393 unique \(fetched\) instructions
93 non-fetched instructions
11 prefetches
21712 data loads
4442 data stores
0 icache flushes
0 dcache flushes
110 scheduling markers
11 transfer markers
0 function id markers
0 function return address markers
0 function argument markers
0 function return value markers
3 other markers
Thread 552323 counts:
4674 \(fetched\) instructions
1028 unique \(fetched\) instructions
0 non-fetched instructions
0 prefetches
1317 data loads
651 data stores
0 icache flushes
0 dcache flushes
14 scheduling markers
0 transfer markers
0 function id markers
0 function return address markers
0 function argument markers
0 function return value markers
3 other markers
Thread 552324 counts:
3482 \(fetched\) instructions
1008 unique \(fetched\) instructions
0 non-fetched instructions
0 prefetches
945 data loads
451 data stores
0 icache flushes
0 dcache flushes
12 scheduling markers
0 transfer markers
0 function id markers
0 function return address markers
0 function argument markers
0 function return value markers
3 other markers
48 changes: 34 additions & 14 deletions clients/drcachesim/tests/trace_invariants.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,18 @@ trace_invariants_t::process_memref(const memref_t &memref)
assert(memrefs_until_interrupt_[memref.data.tid] != 0);
--memrefs_until_interrupt_[memref.data.tid];
}
if (memref.marker.type == TRACE_TYPE_MARKER &&
prev_entry_[memref.data.tid].marker.type == TRACE_TYPE_MARKER &&
prev_entry_[memref.data.tid].marker.marker_type == TRACE_MARKER_TYPE_RSEQ_ABORT) {
// The rseq marker must be immediately prior to the kernel event marker.
assert(memref.marker.marker_type == TRACE_MARKER_TYPE_KERNEL_EVENT);
}
if (memref.marker.type == TRACE_TYPE_MARKER &&
memref.marker.marker_type == TRACE_MARKER_TYPE_RSEQ_ABORT) {
// Check that the rseq final instruction was not executed: that raw2trace
// rolled it back.
assert(memref.marker.marker_value != prev_instr_[memref.data.tid].instr.addr);
}
// Check that the signal delivery marker is immediately followed by the
// app's signal handler, which we have annotated with "prefetcht0 [1]".
if (memref.data.type == TRACE_TYPE_PREFETCHT0 && memref.data.addr == 1) {
Expand Down Expand Up @@ -189,23 +201,28 @@ trace_invariants_t::process_memref(const memref_t &memref)
// Ensure signal handlers return to the interruption point.
if (prev_xfer_marker_[memref.data.tid].marker.marker_type ==
TRACE_MARKER_TYPE_KERNEL_XFER) {
assert(memref.instr.addr ==
pre_signal_instr_[memref.data.tid].top().instr.addr ||
// Asynch will go to the subsequent instr.
memref.instr.addr ==
pre_signal_instr_[memref.data.tid].top().instr.addr +
pre_signal_instr_[memref.data.tid].top().instr.size ||
assert((memref.instr.addr == prev_xfer_int_pc_[memref.data.tid].top() &&
(memref.instr.addr ==
pre_signal_instr_[memref.data.tid].top().instr.addr ||
// Asynch will go to the subsequent instr.
memref.instr.addr ==
pre_signal_instr_[memref.data.tid].top().instr.addr +
pre_signal_instr_[memref.data.tid].top().instr.size ||
// Too hard to figure out branch targets. We have the
// prev_xfer_int_pc_ though.
type_is_instr_branch(
pre_signal_instr_[memref.data.tid].top().instr.type) ||
pre_signal_instr_[memref.data.tid].top().instr.type ==
TRACE_TYPE_INSTR_SYSENTER)) ||
// Nested signal. XXX: This only works for our annotated test
// signal_invariants.
memref.instr.addr == app_handler_pc_ ||
// Marker for rseq abort handler. Not as unique as a prefetch, but
// we need an instruction and not a data type.
memref.instr.type == TRACE_TYPE_INSTR_DIRECT_JUMP ||
// Too hard to figure out branch targets.
type_is_instr_branch(
pre_signal_instr_[memref.data.tid].top().instr.type) ||
pre_signal_instr_[memref.data.tid].top().instr.type ==
TRACE_TYPE_INSTR_SYSENTER);
memref.instr.type == TRACE_TYPE_INSTR_DIRECT_JUMP);
// We assume paired signal entry-exit (so no longjmp and no rseq
// inside signal handlers).
prev_xfer_int_pc_[memref.data.tid].pop();
pre_signal_instr_[memref.data.tid].pop();
}
#endif
Expand All @@ -226,10 +243,13 @@ trace_invariants_t::process_memref(const memref_t &memref)
memref.marker.marker_type == TRACE_MARKER_TYPE_KERNEL_XFER)) {
if (knob_verbose_ >= 3) {
std::cerr << "::" << memref.data.pid << ":" << memref.data.tid << ":: "
<< "marker type " << memref.marker.marker_type << " value "
<< memref.marker.marker_value << "\n";
<< "marker type " << memref.marker.marker_type << " value 0x"
<< std::hex << memref.marker.marker_value << std::dec << "\n";
}
#ifdef UNIX
if (memref.marker.marker_type == TRACE_MARKER_TYPE_KERNEL_EVENT)
prev_xfer_int_pc_[memref.data.tid].push(memref.marker.marker_value);
assert(memref.marker.marker_value != 0);
if (memref.marker.marker_type == TRACE_MARKER_TYPE_KERNEL_EVENT &&
// Give up on back-to-back signals.
prev_xfer_marker_[memref.data.tid].marker.marker_type !=
Expand Down
Loading

0 comments on commit a3d1db7

Please sign in to comment.