Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#3048 drcachesim record funcs: infrastructure #3057

Merged
merged 25 commits into from
Jun 27, 2018

Conversation

Louis-Ye
Copy link
Contributor

@Louis-Ye Louis-Ye commented Jun 19, 2018

Add function traces as extended marker type in memtrace (drcachesim) client.
For each invocation of the specified functions in runtime options, the trace contains its return address, argument(s) value(s), and return value.
Currently we support recording pointer-size arguments and return value.

#3048

Louis-Ye added 3 commits June 18, 2018 17:49
Add function traces as extended marker type. This PR contains only hard-coded malloc for tracing,
but it is easily extended to trace other functions as well.

For each invocation of the pre-defined function, the trace contains its return address,
each argument values, and return value

Fixes DynamoRIO#3048
Add option to enable func trace.
Make drwrap init at beginning and exit at the end.

Fixes DynamoRIO#3048
const int num_args;
} func_metadata;

#define MALLOC_RETADDR TRACE_MARKER_TYPE_FUNC_MALLOC_RETADDR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure you gain much from these, other than saving a few keystrokes. I'd remove them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#define MALLOC_ARG TRACE_MARKER_TYPE_FUNC_MALLOC_ARG
#define MALLOC_RETVAL TRACE_MARKER_TYPE_FUNC_MALLOC_RETVAL

static const func_metadata instru_funcs[] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll obviously need to expand the list (especially with all the operator new variants), but I think it's fine to leave that for later.

I'd remove do_malloc and malloc_impl because they sound like implementation details of particular allocators and it's not guaranteed that the arguments / return values are what we expect them to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ported the do_malloc and malloc_impl from drmemory, and they seem to have the same signature as malloc. Do you think it is safe enough to keep them here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea about malloc_impl (I wouldn't worry too much about windows outside of breaking DR there).

do_malloc() is from an old version of tcmalloc and doesn't exist any more even upstream (https://github.com/gperftools/gperftools/search?q=do_malloc&unscoped_q=do_malloc). Even if it did, I don't think we should've kept it. It was an internal helper routine called by malloc(), so if we trace both, we'd record the same call twice (unless we taught the tracer we're in the same context, but that sounds like unnecessary complexity).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lied, it still exists upstream [1]. Still, we should ignore it so we don't double record.

[1] https://github.com/gperftools/gperftools/blob/f47a52ce85c3d8d559aaae7b7a426c359fbca225/src/tcmalloc.cc#L1348

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will remove those. I guess it is also good to keep it simple as much as possible in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return; /* This thread was filtered out. */

func_metadata *e = (func_metadata *)user_data;
void * retval = drwrap_get_retval(wrapcxt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: void *retval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

(uintptr_t)retaddr);

for (int i = 0; i < e->num_args; i++) {
void * arg_i = drwrap_get_arg(wrapcxt, i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1006,6 +1016,7 @@ event_app_instruction(void *drcontext, void *tag, instrlist_t *bb,
!op_L0_filter.get_value() &&
// The delay instr buffer is not full.
ud->num_delay_instrs < MAX_NUM_DELAY_INSTRS) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -32,6 +32,7 @@ Total counts:
#else
13 total transfer markers
#endif
.* total function markers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slightly more specific test would be nice. Something like a simplified version of static_burst.cpp, where the app calls malloc, say, exactly 1000 times between dr_app_start() and dr_app_stop_and_cleanup(). Then we can run the basic_counts tool same way as here and check we've recorded them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Will add one in next commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -59,6 +59,13 @@ basic_counts_t::~basic_counts_t()
// Empty.
}

static bool is_func_marker(trace_marker_type_t marker_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style violation: newline after return type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


droption_t<bool> op_enable_func_trace
(DROPTION_SCOPE_FRONTEND, "enable_func_trace", true,
"enable traces for function invocations",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the conventions of the other options: the short description should be capitalized with ending punctuation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

droption_t<bool> op_enable_func_trace
(DROPTION_SCOPE_FRONTEND, "enable_func_trace", true,
"enable traces for function invocations",
"The function trace would contain information of arguments, return address, and "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See how this is placed into the html docs: the short is not duplicated, so please include an option description at the start of the long that does not assume the short.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

(DROPTION_SCOPE_FRONTEND, "enable_func_trace", true,
"enable traces for function invocations",
"The function trace would contain information of arguments, return address, and "
"return value for invocations of the pre-defined functions");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

(DROPTION_SCOPE_FRONTEND, "enable_func_trace", true,
"enable traces for function invocations",
"The function trace would contain information of arguments, return address, and "
"return value for invocations of the pre-defined functions");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to explain which functions: how would a user have any idea what "the pre-defined functions" are?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1006,6 +1016,7 @@ event_app_instruction(void *drcontext, void *tag, instrlist_t *bb,
!op_L0_filter.get_value() &&
// The delay instr buffer is not full.
ud->num_delay_instrs < MAX_NUM_DELAY_INSTRS) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1398,6 +1409,100 @@ event_delay_app_instruction(void *drcontext, void *tag, instrlist_t *bb, instr_t
return DR_EMIT_DEFAULT;
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting this new function wrapping code in a separate file? The C style here would be a separate file with an init and exit routine. Looks like it needs to share per_thread_t and BUF_PTR -- though I would say just expose an append_marker routine and that can be the start of a future public interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

typedef struct _func_metadata {
const trace_marker_type_t marker_retaddr;
const trace_marker_type_t marker_arg;
const trace_marker_type_t marker_retval;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above: I think this can all be removed and all we have is an index identifier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


#ifdef UNIX
{MALLOC_RETADDR, MALLOC_ARG, MALLOC_RETVAL, "tc_malloc", 1 },
{MALLOC_RETADDR, MALLOC_ARG, MALLOC_RETVAL, "__libc_malloc", 1},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intercepting malloc & co. is a complex topic. Xref Dr. Memory's interception of all heap allocation routines (https://github.com/DynamoRIO/drmemory/blob/master/common/alloc.c#L317,
https://github.com/DynamoRIO/drmemory/blob/master/common/alloc.c#L433) and longstanding plans to turn that into a library (DynamoRIO/drmemory#824) though that has yet to happen.

Please include comments explaining that this is a first pass and is not complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


#ifdef UNIX
{MALLOC_RETADDR, MALLOC_ARG, MALLOC_RETVAL, "tc_malloc", 1 },
{MALLOC_RETADDR, MALLOC_ARG, MALLOC_RETVAL, "__libc_malloc", 1},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to be able to dynamically specify which functions to record: perhaps via a config file eventually, though we could start with a runtime option. We just need the name and number of args if all we support is recording pointer-sized arg values. You could imagine sthg like:

-record_function malloc 1 -record_function realloc 2

If we decide heap allocation is very common and we want to have built-in support, I would say it should be under a separate option like -record_function_malloc or -record_heap_allocs or sthg.

Alternatively, we could expose an interface to write metadata into the trace and structure all of this as a separate client from the memory address tracer. I'm fine bundling it in for now as it seems like a common thing to want to augment an address trace with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@derekbruening
Copy link
Contributor

Nits about the commit message:

The title: missing #, and as a multi-commit feature please follow the style guide for code reviews. So sthg like i#3048 record funcs: infrastructure and malloc if this retains the hardcoded malloc.

The body: include a ref to the issue: Issue: #3048

@derekbruening derekbruening reopened this Jun 19, 2018
@Louis-Ye Louis-Ye changed the title i3048 drcachesim func traces i#3048 drcachesim func traces Jun 19, 2018
@Louis-Ye Louis-Ye changed the title i#3048 drcachesim func traces i#3048 drcachesim record funcs: infrastructure and malloc Jun 19, 2018
…ns to record function

Separate out the module of function traces from tracer.cpp to func_trace.cpp.
Add burst_malloc test and count number of function markers.
Add option to specify record function.

Fixes DynamoRIO#3048
int arg_num;
} func_metadata_t;

/* expected pattern "<function_name>:<function_id>:<arguments_num>" */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update this comment with more details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"-client_lib ';;-offline'"))
std::cerr << "failed to set env var!\n";

/* We use an outer loop to test re-attaching (i#2157). */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover comment from copy/paste?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


/* Test if the drmemtrace_client_main() in drmemtrace will be called. */
DR_EXPORT WEAK void
drmemtrace_client_main(client_id_t id, int argc, const char *argv[])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to test the weak symbol in this test, you can remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

static int func_trace_init_count;

static func_trace_append_entry_t append_entry;
static std::vector<func_metadata_t> funcs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, about using std::vector (and the std::string above) in a DR client (instead of something like drvector_t). Because in our use-case we are statically linked with the application, you have to be very careful not to break isolation. For example, if func_pre_hook() was calling funcs.push_back(), that could call the same malloc we are currently tracing, which is definitely something we don't want to do, and will lead to all kinds of trouble.

I think you're fine in this case (but @derekbruening sure knows better) -- you're only indexing in the vector when we are called from tracing code. Probably still worth leaving a short comment for the next reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point, I think very likely it will happen. I will update to use drvector_t and avoid to use std::string in the function hook.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

static std::vector<std::string> split_by(std::string s, std::string sep)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: formatting static std::vector<> \n

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#ifdef __cplusplus
extern "C" {
DR_EXPORT bool
func_trace_init(func_trace_append_entry_t append_entry_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicating the declarations this way is confusing. I think it's better to just add the #ifdef ... extern "C" for each of the two below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a chance to look at part of the diff

@@ -333,3 +333,17 @@ droption_t<bool> op_reuse_verify_skip
"Verifies every skip list-calculated reuse distance with a full list walk. "
"This incurs significant additional overhead. This option is only available "
"in debug builds.");

// XXX: replace function return address with function callstack
// XXX: default value should use the customized seperator when the relevant PR comes in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: separator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -333,3 +333,15 @@ droption_t<bool> op_reuse_verify_skip
"Verifies every skip list-calculated reuse distance with a full list walk. "
"This incurs significant additional overhead. This option is only available "
"in debug builds.");

// XXX: replace function return address with function callstack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the issue number (XXX i#3048:)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::string("malloc|0|1") + ";" + std::string("tc_malloc|1|1") + ";" +
std::string("__libc_malloc|2|1"),
"Record invocation traces for the specified functions",
"Option value should fit this format"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating comment from before: please repeat the description in the long (please see the prior comment on the prior review)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// XXX: replace function return address with function callstack
droption_t<std::string> op_record_function
(DROPTION_SCOPE_FRONTEND, "record_function", DROPTION_FLAG_ACCUMULATE, ";",
std::string("malloc|0|1") + ";" + std::string("tc_malloc|1|1") + ";" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so by default every trace will include the values for these 3 functions -- not sure about that for a couple of reasons:

  1. It seems incomplete: someone mapping addresses to data structures or sthg will want at least free() as well, if not calloc and new, etc. Yet there's no comment here or anything in the option doc saying these default values are in-progress and not final (yes, a pet peeve of mine: I like prototype or incremental code to clearly mark itself).

  2. Should this be in all address traces by default: I guess compared to the trace overhead the wrapping overhead should be minimal and the trace size expansion should be minimal (should we measure?) But how would we arrange to have a heap tracing option: some kind of alias convenience option -record_heap which turns into -record_function malloc...free...calloc (or it could set up the recording internally after the option parsing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think -record_heap sounds like a good idea, I will make such an option and make the default -record_function value as empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"function_name|function_id|function_args_num (example:"
" -record_function \"tc_malloc|2|1\"). The trace would contain information for"
" function return address, function argument value(s), and function return"
" value. We only record pointer-sized arguments and return value.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain that the information in the trace is labeled with the function_id via an ID entry prior to each set of value entries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -191,6 +191,32 @@ typedef enum {
*/
TRACE_MARKER_TYPE_CPU_ID,

/**
* The marker value contains the function id defined by user in droption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/user/the user/

Missing .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


/**
* The marker value contains the return address of the function call,
* whose id is specified by the closest previous FUNC_ID marker entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to get a link by using the full name and #: #TRACE_MARKER_TYPE_FUNC_ID.

Missing . again (let's keep these user-facing doxygen comments professional-looking)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* The marker value contains the return address of the function call,
* whose id is specified by the closest previous FUNC_ID marker entry
* XXX: replace return address with callstack information in the future
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comment to us, not the user, so put it in a separate regular (non-doxygen) comment (add the i# too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* The marker value contains one argument value of the function call,
* whose id is specified by the closest previous FUNC_ID marker entry.
* The number of such entries for one function invocation is expected to
* equal to the number of function arguments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but the more important point is that it does equal the specified arguments in the -record_function option, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


/**
* The marker value contains return value of the function call,
* whose id is specified by the closest previous FUNC_ID marker entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the return value, and turn FUNC_ID into a link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished rest of PR.

*/

/* This application links in drmemtrace_static and acquires a trace during
* a "burst" of execution in the middle of the application. It then detaches.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain that it invokes malloc to test function tracing

.*
all done
.*
.* 5000 function markers.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have an analysis tool test that examines the function arg, return value, and retaddr entries so we know they're correct. OK in a separate CL.

Copy link
Contributor Author

@Louis-Ye Louis-Ye Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since basic_counts.cpp is already modified in this CL for counting function markers, I will include the examines for different entries in this same CL. :)

@@ -86,6 +95,9 @@ basic_counts_t::process_memref(const memref_t &memref)
memref.marker.marker_type == TRACE_MARKER_TYPE_KERNEL_XFER) {
++thread_xfer_markers[memref.data.tid];
++total_xfer_markers;
} else if (is_func_marker(memref.marker.marker_type)) {
++thread_func_markers[memref.data.tid];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, would total functions (just _RETVAL might be the way to count that: should cover tailcall too via drwrap) be more useful? Maybe not -- depends on what you want to count. Ignore this here. See prior comment about a function trace analysis tool.

@@ -56,6 +56,13 @@ typedef enum {
DRMEMTRACE_ERROR_NOT_IMPLEMENTED, /**< Operation failed: not implemented. */
} drmemtrace_status_t;


/**
* Name of drmgr instrumentation pass priorities for app2app, analysis, insert,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating prior comment (!): there is no app2app

Copy link
Contributor Author

@Louis-Ye Louis-Ye Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I saw in tracer.cpp that event_bb_app2app is registered as app2app event callback. Did I miss anything here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you're right, my bad: I searched for "app2app" in the github diff view and it wasn't there (b/c it's on a line outside the context diff range) -- used for rep string expansion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{
func_metadata_t *f =
(func_metadata_t *)dr_global_alloc(sizeof(func_metadata_t));
strcpy(f->name, name.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never use strcpy (see https://github.com/DynamoRIO/dynamorio/wiki/Code-Content#security): use strncpy with BUFFER_SIZE_ELEMENTS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for (auto &single_op_value : split_by(value, sep)) {
auto items = split_by(single_op_value, PATTERN_SEPARATOR);
if (items.size() != 3) {
DR_ASSERT(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is user input so instead of an assert this should be a usage error with a message to the user: the FATAL() macro: FATAL("Usage error: -record_function was not passed a triplet") or sthg.

Copy link
Contributor Author

@Louis-Ye Louis-Ye Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used NOTIFY for detectiong format and id duplication problem.
Done

int id = atoi(items[1].c_str());
int arg_num = atoi(items[2].c_str());
if (id_existed(id)) {
DR_ASSERT(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FATAL again with a useful message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with using NOTIFY

#endif // __cplusplus

DR_EXPORT
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal function to the tracer, not exposed to the user: it should not have a doxygen comment, it does not need to be extern "C". Ditto for _exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -152,6 +154,16 @@ static volatile bool exited_process;
static bool have_phys;
static physaddr_t physaddr;

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a regular comment: it's not docs to some user interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* function pre/post callbacks of drwrap API happens before memtrace's
* meta instruction, so that function trace entries will not be appended to the middle
* of a BB's PC and Memory Access trace entries. (Assumption made here is that,
* the function pre/post callback always happended at the first instruction of a BB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for next round of review.

Louis-Ye added 4 commits June 22, 2018 14:22
Add -record_heap for convenient usage of recording heap functions.
Update comments to be more professional.
Count different function markers separately in basic_counts.cpp/h

Fixes DynamoRIO#3048
…ecord_heap shows

Add separator for when both -record_function and -record_heap shows

Fixes DynamoRIO#3048
Use &as item separator for func options.
Modify process_fatal to take only const char*.

Fixes DynamoRIO#3048
@Louis-Ye Louis-Ye changed the title i#3048 drcachesim record funcs: infrastructure and malloc i#3048 drcachesim record funcs: infrastructure Jun 23, 2018
Louis-Ye added 2 commits June 24, 2018 14:44
Notify user about the duplicated ID or invalid input for function trace.
Relayout code for ID detection.
Remove the passing of fatal_process function pointer.

Fixes DynamoRIO#3048
{
func_metadata_t *f =
(func_metadata_t *)dr_global_alloc(sizeof(func_metadata_t));
size_t safe_size = get_safe_size(sizeof(f->name), name.size());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update to use BUFFER_SIZE_ELEMENTS(f->name) in next commit.

Use BUFFER_SIZE_ELEMENTS for strncpy.

Fixes DynamoRIO#3048
Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reply to all the review comments each time to ensure they are all either acted on or deliberately not acted on (with a reason given): I'm seeing several instances here of review requests that were skipped.
I'm assuming that was a mistake and not deliberate.

Mostly I have comments about the user-facing doxygen docs. The code is good.

{
func_metadata_t *f =
(func_metadata_t *)dr_global_alloc(sizeof(func_metadata_t));
size_t safe_size = get_safe_size(sizeof(f->name), name.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BUFFER_SIZE_ELEMENTS (to avoid bugs in char vs wchar_t)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

" return value. The trace is labeled with the function_id via an ID entry"
" prior to each set of value entries."
" Note that the provided function id should be unique, and not collide with"
" existing heap functions (see -record_heap_value ) if record_heap"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inconsistent option reference: -record_heap
nit: space before )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

droption_t<bool> op_record_heap
(DROPTION_SCOPE_ALL, "record_heap", false,
"Enable recording trace for the defined heap functions.",
"It is a convenient option to enable recording trace for the defined heap"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/convenient/convenience/
grammar: s/trace/a trace/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

(DROPTION_SCOPE_ALL, "record_heap", false,
"Enable recording trace for the defined heap functions.",
"It is a convenient option to enable recording trace for the defined heap"
" functions in -record_heap_value. Specify this option is equivalent to"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar: s/Specify/Specifying/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"malloc|0|1" OP_RECORD_FUNC_ITEM_SEP "free|1|1" OP_RECORD_FUNC_ITEM_SEP
"tc_malloc|2|1" OP_RECORD_FUNC_ITEM_SEP "tc_free|3|1" OP_RECORD_FUNC_ITEM_SEP
"__libc_malloc|4|1" OP_RECORD_FUNC_ITEM_SEP "__libc_free|5|1",
" The actual value of the defined heap functions.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would instead say "Functions recorded by -record_heap". No leading space. (To the reader it is confusing as to what "the defined heap functions" is referring to. I would update the start of the long description too.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* The marker value contains one argument value of the function call,
* whose id is specified by the closest previous #TRACE_MARKER_TYPE_FUNC_ID
* marker entry. The number of such entries for one function invocation is
* expected to be equal to the specified arugment in -record_function or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My prior review comment was trying to say, why say "expected to be"? That makes it sound like either it's not under our control or it varies somehow...just say "is equal".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* The marker value contains one argument value of the function call,
* whose id is specified by the closest previous #TRACE_MARKER_TYPE_FUNC_ID
* marker entry. The number of such entries for one function invocation is
* expected to be equal to the specified arugment in -record_function or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling (please write these user-facing docs carefully, just like you would with code): argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

(void *drcontext, trace_marker_type_t marker, uintptr_t value);

DR_EXPORT
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating another prior comment (!): this should be a regular comment.
Also, no DR_EXPORT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

func_trace_init(func_trace_append_entry_t append_entry_);

DR_EXPORT
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto and ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return;

std::string funcs_str, sep;
get_funcs_str_and_sep(funcs_str, sep);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to cache the function string during init and avoid this parsing here: there are scenarios where using STL code at exit can cause problems (not in google3 usage, but in the dr_app_take_over() scenario where the app exits before DR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Ready for next round of review!

Louis-Ye added 3 commits June 26, 2018 11:23
Save input functions string to avoid using STL at func_trace_exit().
Update comments and docs, correct grammar and spelling issues.

Fixes DynamoRIO#3048
Get rid of DR_EXPORT in func_trace.cpp

Fixes DynamoRIO#3048
" Note that the provided function id should be unique, and not collide with"
" existing heap functions (see -record_heap_value ) if record_heap"
" existing heap functions (see -record_heap_value ) if -record_heap"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating (!):
nit: space before )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -192,29 +192,32 @@ typedef enum {
TRACE_MARKER_TYPE_CPU_ID,

/**
* The marker value contains function id defined by the user in droption.
* The marker value contains function id defined by the user in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating (!):
grammar: s/function/the function/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*/
TRACE_MARKER_TYPE_FUNC_ARG,

/**
* The marker value contains return value of the function call,
* The marker value contains return value of the just-entered function,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating (!):
grammar: s/return/the return/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Louis-Ye added 2 commits June 26, 2018 12:42
Update comments for grammar and format issues.

Fixes DynamoRIO#3048
Update comments for consistent format.

Fixes DynamoRIO#3048
Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for updating the coding approach, I like the split heap option and scalable approach.

Thank you for fixing all of the nits (though please carefully look at each review comment in the future: it seems when there were 2 requests in one comment one of them was sometimes overlooked...)

@derekbruening
Copy link
Contributor

Looks like it fails to build on Windows. Temporarily adding another -V to suite/runsuite_wrapper.pl will get more info.

Louis-Ye added 4 commits June 26, 2018 13:37
Add -VV to show more logs.

Fixes DynamoRIO#3048
Add explicit cast for fixing windows build.

Fixes DynamoRIO#3048
Change -VV back to -V in runsuite_wrapper.pl

Fixes DynamoRIO#3048
@Louis-Ye
Copy link
Contributor Author

Louis-Ye commented Jun 27, 2018

The Windows build passes now.

@derekbruening
Copy link
Contributor

run aarch tests

@derekbruening derekbruening merged commit 5492fa3 into DynamoRIO:master Jun 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants