Skip to content

Commit

Permalink
i#5505 kernel tracing: Best effort pt2ir conversion (#6924)
Browse files Browse the repository at this point in the history
Relaxes raw2trace error reporting to ignore errors during conversion of
PT raw traces using libipt. This is enabled by -pt2ir_best_effort which
is a default-false flag. We still keep the strict error reporting for
the small unit tests.

We have observed sporadic decoding issues raised by libipt which are not
yet root-caused. These may be due to our own usage of perf_event_open to
record PT data, or our use of libipt may not be robust enough to handle
these errors.

We keep counts of instructions that failed to decode, which then causes
some 1-instr PC discontinuities in the decoded syscall trace. In these
cases we are still able to produce the converted syscall trace albeit
with the discontinuities.

We also keep count of errors that caused total failure in conversion of
some system call trace. In such cases we simply skip the trace for that
syscall and continue raw2trace (seen mostly on futex-like syscalls).

Overall, for a large app, we were still able to get >98% of the kernel
syscall traces converted successfully. We will continue to debug and
improve PT decoding stability to improve PT trace conversion issues in
drmemtrace.

Verified that the PT related tests continue to work on a system that
supports the Intel-PT hardware feature:

```
The following tests passed:
	code_api|tool.drcacheoff.kernel.simple_SUDO
	code_api|tool.drcacheoff.kernel.opcode-mix_SUDO
	code_api|tool.drcacheoff.kernel.syscall-mix_SUDO
	code_api|tool.drcacheoff.kernel.invariant-checker_SUDO

The following tests passed:
	code_api|client.drpttracer_SUDO-test
```

Issue: #5505
  • Loading branch information
abhinav92003 authored Aug 27, 2024
1 parent 4dfe3d9 commit 009c27b
Show file tree
Hide file tree
Showing 7 changed files with 165 additions and 27 deletions.
3 changes: 2 additions & 1 deletion clients/drcachesim/analyzer_multi.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -450,7 +450,8 @@ analyzer_multi_tmpl_t<RecordType, ReaderType>::analyzer_multi_tmpl_t()
nullptr, op_verbose.get_value(), op_jobs.get_value(),
op_alt_module_dir.get_value(), op_chunk_instr_count.get_value(),
dir.in_kfiles_map_, dir.kcoredir_, dir.kallsymsdir_,
std::move(dir.syscall_template_file_reader_));
std::move(dir.syscall_template_file_reader_),
op_pt2ir_best_effort.get_value());
std::string error = raw2trace.do_conversion();
if (!error.empty()) {
this->success_ = false;
Expand Down
14 changes: 14 additions & 0 deletions clients/drcachesim/common/options.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1072,5 +1072,19 @@ droption_t<bool> op_abort_on_invariant_error(
"total invariant error count is printed at the end; a non-zero error count does not "
"affect the exit code of the analyzer.");

droption_t<bool> op_pt2ir_best_effort(
DROPTION_SCOPE_ALL, "pt2ir_best_effort", false,
"Whether errors encountered during PT trace conversion in pt2ir are ignored.",
"By default, errors in decoding the kernel syscall PT trace in pt2ir are fatal to "
"raw2trace. With -pt2ir_best_effort, those errors do not cause failures and their "
"counts are reported by raw2trace at the end. This may result in a trace where not "
"all syscalls have a trace, and the ones that do may have some PC discontinuities "
"due to non-fatal decoding errors (these discontinuities will still be reported by "
"the invariant checker). When using this option, it is prudent to check raw2trace "
"stats to confirm that the error counts are within expected bounds (total syscall "
"traces converted, syscall traces that were dropped from final trace because "
"conversion failed, syscall traces found to be empty, and non-fatal decode errors "
"seen in converted syscall traces).");

} // namespace drmemtrace
} // namespace dynamorio
1 change: 1 addition & 0 deletions clients/drcachesim/common/options.h
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,7 @@ extern dynamorio::droption::droption_t<std::string> op_filter_func_ids;
extern dynamorio::droption::droption_t<uint64_t> op_trim_before_timestamp;
extern dynamorio::droption::droption_t<uint64_t> op_trim_after_timestamp;
extern dynamorio::droption::droption_t<bool> op_abort_on_invariant_error;
extern dynamorio::droption::droption_t<bool> op_pt2ir_best_effort;

} // namespace drmemtrace
} // namespace dynamorio
Expand Down
38 changes: 35 additions & 3 deletions clients/drcachesim/drpt2trace/pt2ir.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2023 Google, Inc. All rights reserved.
* Copyright (c) 2023-2024 Google, Inc. All rights reserved.
* **********************************************************/

/*
Expand Down Expand Up @@ -102,13 +102,15 @@ pt2ir_t::~pt2ir_t()
}

bool
pt2ir_t::init(DR_PARAM_IN pt2ir_config_t &pt2ir_config, DR_PARAM_IN int verbosity)
pt2ir_t::init(DR_PARAM_IN pt2ir_config_t &pt2ir_config, DR_PARAM_IN int verbosity,
DR_PARAM_IN bool allow_non_fatal_decode_errors)
{
verbosity_ = verbosity;
if (pt2ir_initialized_) {
VPRINT(0, "pt2ir_t is already initialized.\n");
return false;
}
allow_non_fatal_decode_errors_ = allow_non_fatal_decode_errors;

/* Init the configuration for the libipt instruction decoder. */
struct pt_config pt_config;
Expand Down Expand Up @@ -257,7 +259,8 @@ pt2ir_t::init(DR_PARAM_IN pt2ir_config_t &pt2ir_config, DR_PARAM_IN int verbosit

pt2ir_convert_status_t
pt2ir_t::convert(DR_PARAM_IN const uint8_t *pt_data, DR_PARAM_IN size_t pt_data_size,
DR_PARAM_INOUT drir_t *drir)
DR_PARAM_INOUT drir_t *drir,
DR_PARAM_OUT uint64_t *non_fatal_decode_error_count_out)
{
if (!pt2ir_initialized_) {
return PT2IR_CONV_ERROR_NOT_INITIALIZED;
Expand Down Expand Up @@ -287,6 +290,15 @@ pt2ir_t::convert(DR_PARAM_IN const uint8_t *pt_data, DR_PARAM_IN size_t pt_data_
/* This flag indicates whether manual synchronization is required. */
bool manual_sync = true;

uint64_t decoded_instr_count = 0;
uint64_t non_fatal_decode_error_count = 0;
/* XXX: This is currently set based on empirical observations. We use this heuristic
* to detect errors where we can still produce a trace for this syscall albeit with
* some PC discontinuities. Specifically: we allow MAX_ERROR_COUNT non-consecutive
* errors of type pte_bad_query.
*/
constexpr int MAX_ERROR_COUNT = 100;

/* PT raw data consists of many packets. And PT trace data is surrounded by Packet
* Stream Boundary. So, in the outermost loop, this function first finds the PSB. Then
* it decodes the trace data.
Expand Down Expand Up @@ -373,8 +385,19 @@ pt2ir_t::convert(DR_PARAM_IN const uint8_t *pt_data, DR_PARAM_IN size_t pt_data_

/* Decode PT raw trace to pt_insn. */
status = pt_insn_next(pt_instr_decoder_, &insn, sizeof(insn));
if (allow_non_fatal_decode_errors_ && status == -pte_bad_query &&
non_fatal_decode_error_count <= MAX_ERROR_COUNT) {
++non_fatal_decode_error_count;
/* The error may be non-fatal to this syscall's PT trace
* conversion. Try to continue past it. We may lose an instruction
* entry which will show up as a 1-instr PC discontinuity in the
* kernel syscall trace.
*/
status = pt_insn_next(pt_instr_decoder_, &insn, sizeof(insn));
}
if (status < 0) {
dx_decoding_error(status, "get next instruction error", insn.ip);
drir->clear_ilist();
return PT2IR_CONV_ERROR_DECODE_NEXT_INSTR;
}

Expand All @@ -399,9 +422,18 @@ pt2ir_t::convert(DR_PARAM_IN const uint8_t *pt_data, DR_PARAM_IN size_t pt_data_
}
#endif
}
++decoded_instr_count;
drir->append(instr, instr_ip, insn.size, insn.raw);
}
}
// Note that not all non-fatal decode errors correspond to a PC discontinuity
// in the resulting trace.
VPRINT(1,
"libipt decoded " UINT64_FORMAT_STRING
" instructions with " UINT64_FORMAT_STRING " non-fatal decode errors.\n",
decoded_instr_count, non_fatal_decode_error_count);
if (non_fatal_decode_error_count_out != nullptr)
*non_fatal_decode_error_count_out = non_fatal_decode_error_count;
return PT2IR_CONV_SUCCESS;
}

Expand Down
18 changes: 15 additions & 3 deletions clients/drcachesim/drpt2trace/pt2ir.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2023 Google, Inc. All rights reserved.
* Copyright (c) 2023-2024 Google, Inc. All rights reserved.
* **********************************************************/

/*
Expand Down Expand Up @@ -347,10 +347,13 @@ class pt2ir_t {
* @param pt2ir_config The configuration of PT raw trace.
* @param verbosity The verbosity level for notifications. If set to 0, only error
* logs are printed. If set to 1, all logs are printed. Default value is 0.
* @param allow_non_fatal_decode_errors Whether PT decode errors that are not
* fatal to the syscall PT trace's conversion are simply skipped past.
* @return true if the instance is successfully initialized.
*/
bool
init(DR_PARAM_IN pt2ir_config_t &pt2ir_config, DR_PARAM_IN int verbosity = 0);
init(DR_PARAM_IN pt2ir_config_t &pt2ir_config, DR_PARAM_IN int verbosity = 0,
DR_PARAM_IN bool allow_non_fatal_decode_errors = false);

/**
* The convert function performs two processes: (1) decode the PT raw trace into
Expand All @@ -359,13 +362,19 @@ class pt2ir_t {
* @param pt_data The PT raw trace.
* @param pt_data_size The size of PT raw trace.
* @param drir The drir object.
* @param non_fatal_decode_error_count_out Pointer to the integer where the count
* of non-fatal decode errors seen during conversion will be stored. Used only if
* allow_non_fatal_decode_errors was set to true in the init call. This is set
* only if we were still able to generate a converted trace, albeit with some
* PC discontinuities.
* @return pt2ir_convert_status_t. If the conversion is successful, the function
* returns #PT2IR_CONV_SUCCESS. Otherwise, the function returns the corresponding
* error code.
*/
pt2ir_convert_status_t
convert(DR_PARAM_IN const uint8_t *pt_data, DR_PARAM_IN size_t pt_data_size,
DR_PARAM_INOUT drir_t *drir);
DR_PARAM_INOUT drir_t *drir,
DR_PARAM_OUT uint64_t *non_fatal_decode_error_count_out = nullptr);

private:
/* Diagnose converting errors and output diagnostic results.
Expand Down Expand Up @@ -405,6 +414,9 @@ class pt2ir_t {

/* Integer value representing the verbosity level for notifications. */
int verbosity_;

/* Whether non-fatal errors are allowed during PT trace decode. */
bool allow_non_fatal_decode_errors_;
};

} // namespace drmemtrace
Expand Down
84 changes: 68 additions & 16 deletions clients/drcachesim/tracer/raw2trace.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1154,7 +1154,7 @@ raw2trace_t::process_syscall_pt(raw2trace_thread_data_t *tdata, uint64_t syscall
# define RING_BUFFER_SIZE_SHIFT 8
config.pt_raw_buffer_size =
(1L << RING_BUFFER_SIZE_SHIFT) * sysconf(_SC_PAGESIZE);
if (!tdata->pt2ir.init(config, verbosity_)) {
if (!tdata->pt2ir.init(config, verbosity_, pt2ir_best_effort_)) {
tdata->error = "Unable to initialize PT2IR";
return false;
}
Expand Down Expand Up @@ -1192,13 +1192,29 @@ raw2trace_t::process_syscall_pt(raw2trace_thread_data_t *tdata, uint64_t syscall
tdata->pt_decode_state_ = std::unique_ptr<drir_t>(new drir_t(GLOBAL_DCONTEXT));
}
tdata->pt_decode_state_->clear_ilist();
uint64_t syscall_decode_non_fatal_error_count = 0;
pt2ir_convert_status_t pt2ir_convert_status = tdata->pt2ir.convert(
pt_data->data.get(), pt_data_size, tdata->pt_decode_state_.get());
pt_data->data.get(), pt_data_size, tdata->pt_decode_state_.get(),
&syscall_decode_non_fatal_error_count);
if (pt2ir_convert_status != PT2IR_CONV_SUCCESS) {
tdata->error = "Failed to convert PT raw trace to DR IR [error status: " +
std::to_string(pt2ir_convert_status) + "]";
return false;
if (!pt2ir_best_effort_ ||
pt2ir_convert_status != PT2IR_CONV_ERROR_DECODE_NEXT_INSTR) {
tdata->error = "Failed to convert PT raw trace to DR IR [error status: " +
std::to_string(pt2ir_convert_status) + "]";
return false;
}
/* When -pt2ir_best_effort is set, we do not fail raw2trace when pt2ir is
* unable to convert some PT syscall trace.
* TODO i#5505: Maybe the invariant checker should also report such missing
* syscall traces.
*/
accumulate_to_statistic(tdata, RAW2TRACE_STAT_SYSCALL_TRACES_CONVERSION_FAILED,
1);
return true;
}
accumulate_to_statistic(tdata,
RAW2TRACE_STAT_SYSCALL_TRACES_NON_FATAL_DECODING_ERROR_COUNT,
syscall_decode_non_fatal_error_count);

/* Convert the DR IR to trace entries. */
addr_t sysnum =
Expand All @@ -1224,11 +1240,13 @@ raw2trace_t::process_syscall_pt(raw2trace_thread_data_t *tdata, uint64_t syscall
.addr = sysnum };
entries.push_back(end_entry);
if (entries.size() == 2) {
tdata->error = "No trace entries generated from PT data";
return false;
// XXX: Is this simply because the syscall did not end up executing because of
// being interrupted?
accumulate_to_statistic(tdata, RAW2TRACE_STAT_SYSCALL_TRACES_CONVERSION_EMPTY, 1);
return true;
}

accumulate_to_statistic(tdata, RAW2TRACE_STAT_SYSCALL_TRACES_DECODED, 1);
accumulate_to_statistic(tdata, RAW2TRACE_STAT_SYSCALL_TRACES_CONVERTED, 1);
app_pc saved_decode_pc;
trace_entry_t entries_with_encodings[WRITE_BUFFER_SIZE];
trace_entry_t *buf = entries_with_encodings;
Expand Down Expand Up @@ -1563,7 +1581,13 @@ raw2trace_t::do_conversion()
thread_data_[i]->latest_trace_timestamp);
final_trace_instr_count_ += thread_data_[i]->final_trace_instr_count;
kernel_instr_count_ += thread_data_[i]->kernel_instr_count;
syscall_traces_decoded_ += thread_data_[i]->syscall_traces_decoded;
syscall_traces_converted_ += thread_data_[i]->syscall_traces_converted;
syscall_traces_conversion_failed_ +=
thread_data_[i]->syscall_traces_conversion_failed;
syscall_traces_non_fatal_decoding_error_count_ +=
thread_data_[i]->syscall_traces_non_fatal_decoding_error_count;
syscall_traces_conversion_empty_ +=
thread_data_[i]->syscall_traces_conversion_empty;
syscall_traces_injected_ += thread_data_[i]->syscall_traces_injected;
}
} else {
Expand Down Expand Up @@ -1591,7 +1615,11 @@ raw2trace_t::do_conversion()
std::max(latest_trace_timestamp_, tdata->latest_trace_timestamp);
final_trace_instr_count_ += tdata->final_trace_instr_count;
kernel_instr_count_ += tdata->kernel_instr_count;
syscall_traces_decoded_ += tdata->syscall_traces_decoded;
syscall_traces_converted_ += tdata->syscall_traces_converted;
syscall_traces_conversion_failed_ += tdata->syscall_traces_conversion_failed;
syscall_traces_non_fatal_decoding_error_count_ +=
tdata->syscall_traces_non_fatal_decoding_error_count;
syscall_traces_conversion_empty_ += tdata->syscall_traces_conversion_empty;
syscall_traces_injected_ += tdata->syscall_traces_injected;
}
}
Expand All @@ -1612,8 +1640,15 @@ raw2trace_t::do_conversion()
VPRINT(1, "Final trace instr count: " UINT64_FORMAT_STRING ".\n",
final_trace_instr_count_);
VPRINT(1, "Kernel instr count " UINT64_FORMAT_STRING "\n", kernel_instr_count_);
VPRINT(1, "System call PT traces decoded " UINT64_FORMAT_STRING "\n",
syscall_traces_decoded_);
VPRINT(1, "System call PT traces converted " UINT64_FORMAT_STRING "\n",
syscall_traces_converted_);
VPRINT(1, "System call PT traces conversion failed " UINT64_FORMAT_STRING "\n",
syscall_traces_conversion_failed_);
VPRINT(1, "System call PT trace conversion decode errors " UINT64_FORMAT_STRING "\n",
syscall_traces_non_fatal_decoding_error_count_);
VPRINT(1,
"System call PT trace conversion found empty trace " UINT64_FORMAT_STRING "\n",
syscall_traces_conversion_empty_);
VPRINT(1, "System call traces injected from template " UINT64_FORMAT_STRING "\n",
syscall_traces_injected_);
VPRINT(1, "Successfully converted %zu thread files\n", thread_data_.size());
Expand Down Expand Up @@ -3705,7 +3740,8 @@ raw2trace_t::raw2trace_t(
const std::string &alt_module_dir, uint64_t chunk_instr_count,
const std::unordered_map<thread_id_t, std::istream *> &kthread_files_map,
const std::string &kcore_path, const std::string &kallsyms_path,
std::unique_ptr<dynamorio::drmemtrace::record_reader_t> syscall_template_file_reader)
std::unique_ptr<dynamorio::drmemtrace::record_reader_t> syscall_template_file_reader,
bool pt2ir_best_effort)
: dcontext_(dcontext == nullptr ? dr_standalone_init() : dcontext)
, passed_dcontext_(dcontext != nullptr)
, worker_count_(worker_count)
Expand All @@ -3722,6 +3758,7 @@ raw2trace_t::raw2trace_t(
, kcore_path_(kcore_path)
, kallsyms_path_(kallsyms_path)
, syscall_template_file_reader_(std::move(syscall_template_file_reader))
, pt2ir_best_effort_(pt2ir_best_effort)
{
// Exactly one of out_files and out_archives should be non-empty.
// If thread_files is not empty it must match the input size.
Expand Down Expand Up @@ -3806,8 +3843,17 @@ raw2trace_t::accumulate_to_statistic(raw2trace_thread_data_t *tdata,
tdata->final_trace_instr_count += value;
break;
case RAW2TRACE_STAT_KERNEL_INSTR_COUNT: tdata->kernel_instr_count += value; break;
case RAW2TRACE_STAT_SYSCALL_TRACES_DECODED:
tdata->syscall_traces_decoded += value;
case RAW2TRACE_STAT_SYSCALL_TRACES_CONVERTED:
tdata->syscall_traces_converted += value;
break;
case RAW2TRACE_STAT_SYSCALL_TRACES_CONVERSION_FAILED:
tdata->syscall_traces_conversion_failed += value;
break;
case RAW2TRACE_STAT_SYSCALL_TRACES_NON_FATAL_DECODING_ERROR_COUNT:
tdata->syscall_traces_non_fatal_decoding_error_count += value;
break;
case RAW2TRACE_STAT_SYSCALL_TRACES_CONVERSION_EMPTY:
tdata->syscall_traces_conversion_empty += value;
break;
case RAW2TRACE_STAT_SYSCALL_TRACES_INJECTED:
tdata->syscall_traces_injected += value;
Expand All @@ -3830,7 +3876,13 @@ raw2trace_t::get_statistic(raw2trace_statistic_t stat)
case RAW2TRACE_STAT_LATEST_TRACE_TIMESTAMP: return latest_trace_timestamp_;
case RAW2TRACE_STAT_FINAL_TRACE_INSTRUCTION_COUNT: return final_trace_instr_count_;
case RAW2TRACE_STAT_KERNEL_INSTR_COUNT: return kernel_instr_count_;
case RAW2TRACE_STAT_SYSCALL_TRACES_DECODED: return syscall_traces_decoded_;
case RAW2TRACE_STAT_SYSCALL_TRACES_CONVERTED: return syscall_traces_converted_;
case RAW2TRACE_STAT_SYSCALL_TRACES_NON_FATAL_DECODING_ERROR_COUNT:
return syscall_traces_non_fatal_decoding_error_count_;
case RAW2TRACE_STAT_SYSCALL_TRACES_CONVERSION_FAILED:
return syscall_traces_conversion_failed_;
case RAW2TRACE_STAT_SYSCALL_TRACES_CONVERSION_EMPTY:
return syscall_traces_conversion_empty_;
case RAW2TRACE_STAT_SYSCALL_TRACES_INJECTED: return syscall_traces_injected_;
case RAW2TRACE_STAT_MAX:
default: DR_ASSERT(false); return 0;
Expand Down
Loading

0 comments on commit 009c27b

Please sign in to comment.