Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#3129 raw2trace perf: summarize instr_t during raw2trace conversion #3130

Merged
merged 10 commits into from
Aug 21, 2018

Conversation

mtrofin
Copy link
Contributor

@mtrofin mtrofin commented Aug 20, 2018

Summarize instr_t to decouple trace conversion from DR internals (e.g. instru_t APIs)

As added benefit, the previous APIs were not inlined (at least by clang), being in a different module. That was resulting in noticeable (under perf profiling) function call overhead. This change alleviates that.

Issue: #3129

Summarize instr_t to decouple trace conversion from DR internals (e.g.
instru_t APIs)

Issue: DynamoRIO#3129
@derekbruening
Copy link
Contributor

As added benefit, the previous APIs were not inlined (at least by clang), being in a different module. That was resulting in noticeable (under perf profiling) function call overhead. This change alleviates that.

The different module shouldn't matter if DynamoRIO_FAST_IR is turned on, is it turns the calls into macros. Was that tried? It applies to the common IR accessors only so maybe the issue here is that uncommon calls like instr_is_cti are being used which are not currently part of DynamoRIO_FAST_IR. Not sure it should be extended just for this use case.

@mtrofin
Copy link
Contributor Author

mtrofin commented Aug 20, 2018

The main justification is actually the decoupling - when we templatize the parts of raw2trace_t we need, we'd end up with needing to pull instru_t into raw2trace.h.

@@ -71,6 +71,128 @@ struct module_t {
bool is_external; // If true, the data is embedded in drmodtrack custom fields.
};

/**
* instr_summary is a compact encapsulation of the information needed by trace conversion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

app_pc next_pc_ = 0;

std::vector<opnd_t> srcs_and_dests_;
uint8_t srcs_ = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_mem_srcs_ or count_mem_srcs_ is more descriptive: o/w this seems like the actual list of them. Ditto for dests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

instr_summary_t &
operator=(instr_summary_t &&) = delete;

uint16_t type_ = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that an _ suffix is not used anywhere else to mark class members: in fact for drcachesim _ is used in kind of the opposite way to distinguish param vars to constructors. I don't feel strongly, ok to leave.

return srcs_and_dests_[srcs_ + pos];
}
size_t
srcs() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_mem_srcs() or count_memory_srcs() or sthg is more descriptive: see below too. Ditto for dsts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

const opnd_t &
dest_at(size_t pos) const
{
return srcs_and_dests_[srcs_ + pos];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining srcs and dsts just seems to add complexity: is there something positive coming out of it? Maybe best to separate.

Copy link
Contributor Author

@mtrofin mtrofin Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One less large object (another std::vector is larger than the 2 bytes). This helps in an implementation where we don't allocated individual instr_summary_t on the heap, but rather bulk-allocate entire pages of them.

In fact, we only need the nr of sources.

byte length_ = 0;
app_pc next_pc_ = 0;

std::vector<opnd_t> srcs_and_dests_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the implementation and use, these are actually only the memory operands, while the name here sounds like all operands. Better to use memory_srcs_ and memory_dsts_ or sthg to make this clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added mem_

}

const opnd_t &
src_at(size_t pos) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory_src_at() or sthg -- see also other comments. Ditto for dest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

(instr->reads_memory() || instr->writes_memory())) {
for (uint j = 0; j < instr->srcs(); j++) {
std::string error =
append_memref(&buf, tidx, instr, instr->src_at(j), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the check for being a memory ref.
UPDATE: see below: srcs and src_at would be much clearer with names like num_mem_srcs and mem_src_at or sthg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, renamed

desc->srcs_and_dests_.resize(desc->srcs_ + desc->dests_);
int pos = 0;
TRAVERSE_SRCS(desc->srcs_and_dests_[pos++] = op)
TRAVERSE_DSTS(desc->srcs_and_dests_[pos++] = op)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler code that just does a loop with push_back and no macros will be more readable and maintainable, and I wouldn't think any slight overhead here would matter: unless there's profiling evidence I'd rather not have macros and less readable code on a non-critical-path like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, and we can always rework this if it pops on any perf radar. reworked.

FIELD(IsPrefetch, 2)
FIELD(IsFlush, 3)
FIELD(IsCti, 4)
#undef FIELD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another one where the style used in all existing DR code of hex constants

...
kIsPrefetchMask = 0x0002;
kIsFlushMask = 0x0004;
kIsCtiMask = 0x0008;

and setting it via

if (instr_is_prefetch(instr))
     desc->packed_ |= kIsPrefetchMask;

seems simpler and more readable to me, avoids extra macros, doesn't rely on "bool" being 1 (such assumptions lead to bugs in C code), and matches the existing style. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworked.

@@ -878,8 +878,7 @@ raw2trace_t::get_instr_summary(uint64 modidx, uint64 modoffs, INOUT app_pc *pc,
hashtable_add(&decode_cache, decode_pc, desc);
ret = desc;
} else {
// TODO(mtrofin): log some rendering of the instruction summary that will be
// returned
// Log some rendering of the instruction summary that will be returned. i#3129
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment: there is no log here. This was a "TODO" which as my prior comment said is "XXX" in the DR style (see the wiki) and I assume it is still a comment about future work, right? Repeating what that should look like in the DR style: XXX i#3129: Log ... .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh! I see... I kept looking in the wiki for "TODO" (assuming xxx referred to "substitute text here"). Didn't find anything about "todo", so assumed you were describing the style. Anyway. I think I understand now.

* orig_pc and verbosity)
* Populates a pre-allocated instr_summary_t description, from the instruction found
* at pc. Updates pc to the next instruction. Optionally logs translation details
* (using orig_pc and verbosity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: missing trailing .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::vector<opnd_t> srcs_and_dests_;
uint8_t srcs_ = 0;
uint8_t dests_ = 0;
std::vector<opnd_t> mem_srcs_and_dests_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining why this squishes the srcs and dests together. (Otherwise someone is going to split them up, seeing no reason to combine them.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -72,7 +72,7 @@

#define DO_VERBOSE(level, x) \
do { \
if (this->verbosity >= (level)) { \
if (verbosity >= (level)) { \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this assumed var verbosity seems a little odd and fragile when it doesn't exist anywhere except in one function. Maybe verbosity should be a parameter to the macro? (And then in ()'s when refed of course.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -878,7 +878,8 @@ raw2trace_t::get_instr_summary(uint64 modidx, uint64 modoffs, INOUT app_pc *pc,
hashtable_add(&decode_cache, decode_pc, desc);
ret = desc;
} else {
// Log some rendering of the instruction summary that will be returned. i#3129
/* XXX i#3129 Log some rendering of the instruction summary that will be returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing :

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

instr_set_translation(instr, orig_pc);
dr_print_instr(dcontext, STDOUT, instr, "");
},
verbosity);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems easier to have verbosity ordered before the statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro is used exactly in one place, so chose to delete the macro and write the explicit check - imho, easier to understand, and one less macro.

@derekbruening
Copy link
Contributor

The commit title is not the right style -- please see https://github.com/DynamoRIO/dynamorio/wiki/Code-Reviews#commit-messages for future commits.

@mtrofin mtrofin changed the title i#3129 Summarize instr_t during raw2trace conversion i#3129 raw2trace perf: summarize instr_t during raw2trace conversion Aug 21, 2018
@derekbruening
Copy link
Contributor

tool.drcachesim.missfile-config-file failed on Appveyor. c3d5b47 was just committed which changes the missfile behavior -- yet it doesn't seem like it could cause this failure as it doesn't change the simulator output. Strange that it never failed before though. cc @kharbutli. Going to commit this and filed #3136 to track the test failure.

@derekbruening derekbruening merged commit 9d849e2 into DynamoRIO:master Aug 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants