-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#3129 raw2trace perf: summarize instr_t during raw2trace conversion #3130
Conversation
Summarize instr_t to decouple trace conversion from DR internals (e.g. instru_t APIs) Issue: DynamoRIO#3129
The different module shouldn't matter if DynamoRIO_FAST_IR is turned on, is it turns the calls into macros. Was that tried? It applies to the common IR accessors only so maybe the issue here is that uncommon calls like instr_is_cti are being used which are not currently part of DynamoRIO_FAST_IR. Not sure it should be extended just for this use case. |
The main justification is actually the decoupling - when we templatize the parts of raw2trace_t we need, we'd end up with needing to pull instru_t into raw2trace.h. |
@@ -71,6 +71,128 @@ struct module_t { | |||
bool is_external; // If true, the data is embedded in drmodtrack custom fields. | |||
}; | |||
|
|||
/** | |||
* instr_summary is a compact encapsulation of the information needed by trace conversion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
app_pc next_pc_ = 0; | ||
|
||
std::vector<opnd_t> srcs_and_dests_; | ||
uint8_t srcs_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_mem_srcs_ or count_mem_srcs_ is more descriptive: o/w this seems like the actual list of them. Ditto for dests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
instr_summary_t & | ||
operator=(instr_summary_t &&) = delete; | ||
|
||
uint16_t type_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that an _ suffix is not used anywhere else to mark class members: in fact for drcachesim _ is used in kind of the opposite way to distinguish param vars to constructors. I don't feel strongly, ok to leave.
return srcs_and_dests_[srcs_ + pos]; | ||
} | ||
size_t | ||
srcs() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_mem_srcs() or count_memory_srcs() or sthg is more descriptive: see below too. Ditto for dsts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
const opnd_t & | ||
dest_at(size_t pos) const | ||
{ | ||
return srcs_and_dests_[srcs_ + pos]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combining srcs and dsts just seems to add complexity: is there something positive coming out of it? Maybe best to separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One less large object (another std::vector is larger than the 2 bytes). This helps in an implementation where we don't allocated individual instr_summary_t on the heap, but rather bulk-allocate entire pages of them.
In fact, we only need the nr of sources.
byte length_ = 0; | ||
app_pc next_pc_ = 0; | ||
|
||
std::vector<opnd_t> srcs_and_dests_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the implementation and use, these are actually only the memory operands, while the name here sounds like all operands. Better to use memory_srcs_ and memory_dsts_ or sthg to make this clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added mem_
} | ||
|
||
const opnd_t & | ||
src_at(size_t pos) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memory_src_at() or sthg -- see also other comments. Ditto for dest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
(instr->reads_memory() || instr->writes_memory())) { | ||
for (uint j = 0; j < instr->srcs(); j++) { | ||
std::string error = | ||
append_memref(&buf, tidx, instr, instr->src_at(j), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing the check for being a memory ref.
UPDATE: see below: srcs and src_at would be much clearer with names like num_mem_srcs and mem_src_at or sthg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, renamed
desc->srcs_and_dests_.resize(desc->srcs_ + desc->dests_); | ||
int pos = 0; | ||
TRAVERSE_SRCS(desc->srcs_and_dests_[pos++] = op) | ||
TRAVERSE_DSTS(desc->srcs_and_dests_[pos++] = op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler code that just does a loop with push_back and no macros will be more readable and maintainable, and I wouldn't think any slight overhead here would matter: unless there's profiling evidence I'd rather not have macros and less readable code on a non-critical-path like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, and we can always rework this if it pops on any perf radar. reworked.
FIELD(IsPrefetch, 2) | ||
FIELD(IsFlush, 3) | ||
FIELD(IsCti, 4) | ||
#undef FIELD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another one where the style used in all existing DR code of hex constants
...
kIsPrefetchMask = 0x0002;
kIsFlushMask = 0x0004;
kIsCtiMask = 0x0008;
and setting it via
if (instr_is_prefetch(instr))
 desc->packed_ |= kIsPrefetchMask;
seems simpler and more readable to me, avoids extra macros, doesn't rely on "bool" being 1 (such assumptions lead to bugs in C code), and matches the existing style. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworked.
@@ -878,8 +878,7 @@ raw2trace_t::get_instr_summary(uint64 modidx, uint64 modoffs, INOUT app_pc *pc, | |||
hashtable_add(&decode_cache, decode_pc, desc); | |||
ret = desc; | |||
} else { | |||
// TODO(mtrofin): log some rendering of the instruction summary that will be | |||
// returned | |||
// Log some rendering of the instruction summary that will be returned. i#3129 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the comment: there is no log here. This was a "TODO" which as my prior comment said is "XXX" in the DR style (see the wiki) and I assume it is still a comment about future work, right? Repeating what that should look like in the DR style: XXX i#3129: Log ... .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh! I see... I kept looking in the wiki for "TODO" (assuming xxx referred to "substitute text here"). Didn't find anything about "todo", so assumed you were describing the style. Anyway. I think I understand now.
* orig_pc and verbosity) | ||
* Populates a pre-allocated instr_summary_t description, from the instruction found | ||
* at pc. Updates pc to the next instruction. Optionally logs translation details | ||
* (using orig_pc and verbosity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: missing trailing .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
std::vector<opnd_t> srcs_and_dests_; | ||
uint8_t srcs_ = 0; | ||
uint8_t dests_ = 0; | ||
std::vector<opnd_t> mem_srcs_and_dests_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining why this squishes the srcs and dests together. (Otherwise someone is going to split them up, seeing no reason to combine them.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -72,7 +72,7 @@ | |||
|
|||
#define DO_VERBOSE(level, x) \ | |||
do { \ | |||
if (this->verbosity >= (level)) { \ | |||
if (verbosity >= (level)) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this assumed var verbosity
seems a little odd and fragile when it doesn't exist anywhere except in one function. Maybe verbosity
should be a parameter to the macro? (And then in ()'s when refed of course.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -878,7 +878,8 @@ raw2trace_t::get_instr_summary(uint64 modidx, uint64 modoffs, INOUT app_pc *pc, | |||
hashtable_add(&decode_cache, decode_pc, desc); | |||
ret = desc; | |||
} else { | |||
// Log some rendering of the instruction summary that will be returned. i#3129 | |||
/* XXX i#3129 Log some rendering of the instruction summary that will be returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing :
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
instr_set_translation(instr, orig_pc); | ||
dr_print_instr(dcontext, STDOUT, instr, ""); | ||
}, | ||
verbosity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems easier to have verbosity ordered before the statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro is used exactly in one place, so chose to delete the macro and write the explicit check - imho, easier to understand, and one less macro.
The commit title is not the right style -- please see https://github.com/DynamoRIO/dynamorio/wiki/Code-Reviews#commit-messages for future commits. |
tool.drcachesim.missfile-config-file failed on Appveyor. c3d5b47 was just committed which changes the missfile behavior -- yet it doesn't seem like it could cause this failure as it doesn't change the simulator output. Strange that it never failed before though. cc @kharbutli. Going to commit this and filed #3136 to track the test failure. |
Summarize instr_t to decouple trace conversion from DR internals (e.g. instru_t APIs)
As added benefit, the previous APIs were not inlined (at least by clang), being in a different module. That was resulting in noticeable (under perf profiling) function call overhead. This change alleviates that.
Issue: #3129