Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add annotation support #572

Closed
derekbruening opened this issue Nov 28, 2014 · 3 comments
Closed

add annotation support #572

derekbruening opened this issue Nov 28, 2014 · 3 comments

Comments

@derekbruening
Copy link
Contributor

From [email protected] on August 29, 2011 16:08:53

there are several proposals for how to implement annotations, including empty function calls (as done in tsan) and special instr sequences (as in valgrind).

if we only need source compatibility for issue #283 we are free to use our own implementation.

I'm leaning toward using dead code to minimize the perf impact to the application: essentially "jmp foo; annotation code; foo:".
I have a bunch of design notes on aspects of this and complications w/ lack of inline asm in cl64 but alternatives involving intrinsics and tricks to avoid dead code removal.

at least one annotation, to flush code b/c it has changed, is relevant to DR: so perhaps DR (or an Extension) should provide an annotation infrastructure and expose it as an event-driven API.

whatever design is chosen it should be able to support at the source level the annotations of other tools ( issue #283 ) which isn't too much of a constraint but does rule out certain implementations I was considering

Original issue: http://code.google.com/p/drmemory/issues/detail?id=572

@derekbruening
Copy link
Contributor Author

From [email protected] on September 21, 2011 14:07:05

We talked briefly about seeing if there was some way to use DWARF to create annotations, but I think that generally in the compiler backend debug info is considered "nice to have" and is easily thrown away with optimizations. Therefore, I'd be worried about the effectiveness of any metadata or DWARF based annotation solution.

An alternative is the gcc labels-as-values extension which is kind of interesting: http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html This allows you to get the actual runtime address of the label as a value in the code, so you can say "goto *my_labels_array[i]" and it will emit an indirect jump within the function. Linux has this interesting BUG() macro that uses the address of the label to do a lookup of the current PC, and then print the source line number. That was originally introduced to save space in the kernel image by not creating so many concatenated func:line strings. The reason I mention it is that getting labels-as-values right is a correctness issue for the compiler, and so therefore it is more likely to be lowered properly than debug information annotations.

Unfortunately, I can't think of a way to get the label value out of the function using a macro that compiles down to nothing at runtime. Furthermore, just introducing a label and taking its address may affect the optimizers in unanticipated ways, even if there is no indirect goto in the function. Finally, this doesn't work for Visual Studio, which is a big deal for us.

I think any reasonable solution will have to introduce some kind of code that gets executed at runtime, because otherwise the optimizers have no notion of a "program point" and they will want to schedule loads and stores across the annotation. We need some kind of barrier (function call or inline asm) to prevent that.

I think a call to a function passing the annotation parameters as args is probably the cleanest, most portable, and most likely to work over time as compilers change. If we want to have binary compatibility, we can insert the magic Valgrind asm in those functions.

@derekbruening
Copy link
Contributor Author

From [email protected] on December 06, 2011 12:10:53

pasting in my notes about dead code. the most promising is the conditional branch idea which while it has a memref at least it's to the same location across all annotations.

** TODO issue #572: faster annotations: jmp over data

there are several proposals for how to implement annotations, including
empty function calls (as done in tsan) and special instr sequences (as in
valgrind).

if we only need source compatibility for issue #283 we are free to use our
own implementation.

I'm leaning toward using dead code to minimize the perf impact to the
application: essentially "jmp foo; annotation code; foo:". I have a bunch
of design notes on aspects of this and complications w/ lack of inline asm
in cl64 but alternatives involving intrinsics and tricks to avoid dead code
removal.

at least one annotation, to flush code b/c it has changed, is relevant to
DR: so perhaps DR (or an Extension) should provide an annotation
infrastructure and expose it as an event-driven API.

whatever design is chosen it should be able to support at the source level
the annotations of other tools (issue #283) which isn't too much of a
constraint but does rule out certain implementations I was considering

not sure annotations are common enough to need optimizing, even if put full
suppression files into annotations: plus, can't use production build w/
tools anyway. but then why is compiler team so interested in putting this
into debug info?

my idea: insert into app a jmp over say 5 bytes: a 4-byte magic word and a
1-byte annotation identifier.

  • low overhead w/o tool: lower than empty function call or valgrind instrs
  • requires inline asm though to insert so how accomplish on cl64?
    goto + carefully selected intrinsic?
  • to avoid needing too much emulation to determine params, should
    require single-value args to annotations: but need to support
    valgrind + tsan annotations which can have complex exprs.
    maybe could actually execute the param code?
    and have drmem save regs around it?
  • some annotations return values: RUNNING_ON_VALGRIND,
    VALGRIND_COUNT_ERRORS, VALGRIND_PRINTF, VALGRIND_STACK_REGISTER.
    for these can still have jmp-over-data, and after the jmp target have a
    return value set, to minimize overhead w/o the tool.

*** TODO optimized away as dead code
**** TODO /O2 removes dead code

goto skip_writecr; 
__writecr0(0x244f4952);

skip_writecr:
=>
VS2005 x86:
00401028: EB 08 jmp 00401032
0040102A: B8 52 49 4F 24 mov eax,244F4952h
0040102F: 0F 22 C0 mov cr0,eax
00401032:

goes away w/ /O2, and pragma must be outside function but don't want to
force whole function to be unopt.

if I put that inside its own function, I can't get it inlined w/o also
getting optimized away (tried __inline and __forceinline) regardless of
whether marked as do-not-optimize: it's all b/c of the goto skipping it.
need a conditional? starting to turn into valgrind-ish.

docs for __writecr0 say "only avail in kernel mode": same with __writemsr:
but it lets me use it.

using rdtscp, though didn't try opt here, and rdtscp is not avail on VS2005:
VS2008 x86:
00401028: EB 0A jmp 00401034
0040102A: 0F 01 F9 rdtscp
0040102D: B8 52 49 4F 24 mov eax,244F4952h
00401032: 89 08 mov dword ptr [eax],ecx
00401034:
VS2008 x64:
0000000140001018: EB 0C jmp 0000000140001026
000000014000101A: 0F 01 F9 rdtscp
000000014000101D: 41 B8 52 49 4F 24 mov r8d,244F4952h
0000000140001023: 41 89 08 mov dword ptr [ r8 ],ecx
0000000140001026:

**** TODO inline asm is not optimized away

one soln where have inline asm: create jmp w/ raw bytes. most likely no
pass will decode the raw bytes so dead code removal won't see it.

gcc even at -O3 leaves asm dead code alone:
char *p = malloc(4);
80483fd: c7 04 24 04 00 00 00 movl $0x4,(%esp)
8048404: e8 0b ff ff ff call 8048314 malloc@plt
8048409: 89 44 24 1c mov %eax,0x1c(%esp)
__asm("jmp skipdata; .int 0xabababab; mov %0,%%eax; skipdata:" : : "m"(p));
804840d: eb 08 jmp 8048417
804840f: ab stos %eax,%es:(%edi)
8048410: ab stos %eax,%es:(%edi)
8048411: ab stos %eax,%es:(%edi)
8048412: ab stos %eax,%es:(%edi)
8048413: 8b 44 24 1c mov 0x1c(%esp),%eax
08048417 :

tool has to remember next target when sees jmp, since easier to evaluate
address then

**** TODO conditional branch works

this stays in cl64 /O2 but has mem ref and conditional branch:
static volatile int zero;

if (!zero)
    goto skip_writecr; 

__writefsbyte(0x244f4952, ptr);
__writefsbyte(0x244f4952, sz); /* annotation params */

skip_writecr:
=>
00401017: A1 60 23 42 00 mov eax,dword ptr ds:[00422360h]
0040101C: 83 C4 04 add esp,4 (from malloc call)
0040101F: 85 C0 test eax,eax
00401021: 74 08 je 0040102B
00401023: 64 C6 05 52 49 4F mov byte ptr fs:[244F4952h],2
24 02
0040102B

but, for the vassert usage, why would anyone bother w/ this when can impl a
conditional based on a global themselves?

**** TODO could use syzygy to change to direct jmp
**** TODO measure cost of call vs cbr vs jmp in microbenchmark
*** TODO compiler-specific annotations?

if design them right so annotation interface won't change could do inline
asm now and put in func call impl for cl64 once port to 64-bit

but what about cross-tool-compatible annotations? if that's only important
at source level can make it work most likely

where are params? can do push idea if wrap malloc: but wrapping malloc
means would want context-specific info. better to just take address.

*** TODO how get parameters

easier to get params w/ jmp-dead-code if execute the dead code: else have
to emulate, and if have complex expression may be multiple instrs.

might there in fact be branches if have ?: operator or sthg?
should be ok: don't assume single bb. just add address of jmp target to
some table, and when hit bb w/ continuation pc equal to that, then insert
clean call.

the inline asm puts the params into registers, and then the DR extension
adds a clean call using those registers (including an annotation ID) to the
client annotation event callback.

can use locals: yes will cost extra stack slots but that's not a big deal.

any chance executing the dead code will mess up app stack?
should have gcc inline asm mark which regs are touched.
but what if compiler concludes the dead code can't affect the subsequent
code?

*** TODO alternate idea: store address into global array

have global array of ignore-if-leaked
but code to store it and incr index probably slower than empty func call

*** TODO alternate idea: extra push or sthg to malloc call

if annotation macro for leak wraps malloc call, could have it change call
sequence and have drmem recognize that pattern: extra push or sthg

@derekbruening
Copy link
Contributor Author

From [email protected] on June 17, 2014 11:22:41

should we add "configure_DynamoRIO_annotations(target)"?
it would add include_dir and add the .c file as a source
(and avoid all the global flag whacking of client config)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant