Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drbbdup fails to interoperate with drmgr emulation API #5390

Closed
derekbruening opened this issue Mar 3, 2022 · 11 comments · Fixed by #5392 or #5407
Closed

drbbdup fails to interoperate with drmgr emulation API #5390

derekbruening opened this issue Mar 3, 2022 · 11 comments · Fixed by #5392 or #5407

Comments

@derekbruening
Copy link
Contributor

For #3995 I'm integrating drbbdup with drmemtrace, the tracer for drcachesim.
But drmemtrace uses the drmgr emulation support:

instr_t *instr_fetch = drmgr_orig_app_instr_for_fetch(drcontext);
instr_t *instr_operands = drmgr_orig_app_instr_for_operands(drcontext);

Those routines are not supported by drbbdup, which splits the where to insert from the app instr being instrumented, to handle the inability to clone a block-final branch or syscall.
This causes drmemtrace to instrument the wrong instruction.

Xref past discussions on possibly integrating drbbdup with drmgr.
A possible simpler solution is to add a drmgr API to set the current app instruction.
Or, could we re-implement the drbbdup where vs instr split to instead use the emulation API itself?
That is one of the intended uses of the emulation API, for app instr rewrites that ease instrumentation (such as rep string or scatter-gather expansion): it is not only for pure emulation.

@derekbruening
Copy link
Contributor Author

Or, could we re-implement the drbbdup where vs instr split to instead use the emulation API itself?

This is appealing. The downside is that all drbbdup users must use the drmgr_orig_app_instr_for_fetch(), etc. operations whenever they examine the app instruction being instrumented or else they'll get the wrong instruction quite frequently. Previously, only tool code that explicitly used rep-string or scatter-gather expansions or that explicitly wanted to be emulation-aware needed to do this. It's yet another layer on top of the base interfaces...but it still feels like the cleanest approach.

@johnfxgalea any opinions here?

@johnfxgalea
Copy link
Contributor

johnfxgalea commented Mar 3, 2022

So I am actually interested in integrating drbbdup in drmgr as I think we would be able to reduce its api complexity. If this had to happen, how would it be a solution for this issue?

@derekbruening
Copy link
Contributor Author

So I am actually interested in integrating drbbdup in drmgr as I think we would be able to reduce its api complexity. If this had to happen, how would it be a solution for this issue?

drmgr_orig_app_instr_for_fetch and its sibling use the stored pt->insertion_instr, which for drbbdup is where instead of instr for the final non-cloned instr. If drbbdup were integrated, it would ensure that pt->insertion_instr held instr instead of where.

@johnfxgalea
Copy link
Contributor

johnfxgalea commented Mar 3, 2022

So from the perspective of clients using drbbdup, instrumentation routines would no longer have instr passed via a parameter but only where, and users always rely on drmgr_orig_app_instr_for_fetch() to get instr? Are we on the same page?

@derekbruening
Copy link
Contributor Author

So from the perspective of clients using drbbdup, instrumentation routines would no longer have instr passed via a parameter but only where, and users always rely on drmgr_orig_app_instr_for_fetch() to get instr? Are we on the same page?

Yes, but one option would be to keep the separate instr and where for compatibility and clients who don't need to be expansion- or emulation-aware could continue using instr.

Long-term maybe everything should take where, instr_for_fetch, and instr_for_operands and we deprecate all the other interfaces...?

@johnfxgalea
Copy link
Contributor

johnfxgalea commented Mar 3, 2022

And at the moment, what is stopping you from using instr (passed as param) instead of drmgr_orig_app_instr_for_fetch()?

@derekbruening
Copy link
Contributor Author

And at the moment, what is stopping you from using instr (passed as param) instead of drmgr_orig_app_instr_for_fetch()?

That is the wrong instruction for rep-string or scatter-gather expansions. It results in significant inaccuracies in the trace as those expansions can account for a large number of dynamic instructions, so getting it wrong matters.

@derekbruening
Copy link
Contributor Author

And at the moment, what is stopping you from using instr (passed as param) instead of drmgr_orig_app_instr_for_fetch()?

That is the wrong instruction for rep-string or scatter-gather expansions. It results in significant inaccuracies in the trace as those expansions can account for a large number of dynamic instructions, so getting it wrong matters.

So right now this is a blocker to making the switch to use drbbdup in drmemtrace. Since we want to use this ASAP...I might put in a short-term workaround that can be undone if a longer-term change like integration into drmgr happens.

derekbruening added a commit that referenced this issue Mar 4, 2022
Adds emulation markers for each non-cloned final drbbdup instruction
to ensure that drmgr_orig_app_instr_for_fetch() and
drmgr_orig_app_instr_for_operands() work properly with drbbdup.

Adds a sanity test that those emulation queries never see drbbdup's
jump to a label.  The test fails without the fix here.

Fixes #5390
derekbruening added a commit that referenced this issue Mar 4, 2022
Adds emulation markers for each non-cloned final drbbdup instruction
to ensure that drmgr_orig_app_instr_for_fetch() and
drmgr_orig_app_instr_for_operands() work properly with drbbdup.

Adds a sanity test that those emulation queries never see drbbdup's
jump to a label.  The test fails without the fix here.

Fixes #5390
@derekbruening
Copy link
Contributor Author

Another issue: will DR_EMULATE_REST_OF_BLOCK in 1st case then apply to every full block dup after that?
Soln: drbbdup should insert an emulation end label at the end of each case.

derekbruening added a commit that referenced this issue Mar 5, 2022
Adds emulation markers for each non-cloned final drbbdup instruction
to ensure that drmgr_orig_app_instr_for_fetch() and
drmgr_orig_app_instr_for_operands() work properly with drbbdup.
These marker labels are hidden from drbbdup callbacks.

Adds a new drbbdup-emul test which employs drutil_expand_rep_str
as well as its own emulation markers.  It ensures that emulation queries
never see drbbdup's jump to a label.  The test fails without the fix here.

Fixes #5390
@derekbruening derekbruening reopened this Mar 8, 2022
@derekbruening
Copy link
Contributor Author

Re-opening as this was not fully fixed: my code only marked the non-final drbbdup cases with emulation markers. The final case doesn't have a jump to the exit: it just ends in a label. drmemtrace ends up not seeing any block-final branch as a result.

I found this via the invariant_checker on WIndows where there happened to be a block with nothing but a jump, which was omitted and thus there was a gap:

Hit retrace threshold: enabling tracing.
<Stopping application D:\derek\dr\git\build_x64_dbg_tests\suite\tests\bin\simple_app.exe (14236)>
Trace invariant failure in T8428 at ref # 10873: Non-explicit control flow has no marker
abort() has been called
08:58 PM ~/dr/git/build_x64_dbg_tests
% bin64/drrun -t drcachesim -indir drmemtrace.*.dir -simulator_type view -sim_refs 35 -skip_refs 10850
    10851: T8428     read  1 byte(s) @ 0x7ff72190b342
    10852: T8428     write 1 byte(s) @ 0x638071f95a
    10853: T8428   0x00007ff7218bcc6f  f3 a4                rep movsb
    10854: T8428     read  1 byte(s) @ 0x7ff72190b343
    10855: T8428     write 1 byte(s) @ 0x638071f95b
    10856: T8428   0x00007ff7218bcc6f  f3 a4                rep movsb
    10857: T8428     read  1 byte(s) @ 0x7ff72190b344
    10858: T8428     write 1 byte(s) @ 0x638071f95c
    10859: T8428   0x00007ff7218bcc6f  f3 a4                rep movsb
    10860: T8428     read  1 byte(s) @ 0x7ff72190b345
    10861: T8428     write 1 byte(s) @ 0x638071f95d
    10862: T8428   0x00007ff7218bcc6f  f3 a4                rep movsb
    10863: T8428     read  1 byte(s) @ 0x7ff72190b346
    10864: T8428     write 1 byte(s) @ 0x638071f95e
    10865: T8428   0x00007ff7218bcc6f  f3 a4                rep movsb
    10866: T8428     read  1 byte(s) @ 0x7ff72190b347
    10867: T8428     write 1 byte(s) @ 0x638071f95f
    10868: T8428   0x00007ff7218bcd22  48 8b 44 24 40       mov    0x40(%rsp), %rax
    10869: T8428     read  8 byte(s) @ 0x638071f910
    10870: T8428   0x00007ff7218bcd27  48 83 c4 28          add    $0x28, %rsp
    10871: T8428   0x00007ff7218bcd2b  5f                   pop    %rdi
    10872: T8428     read  8 byte(s) @ 0x638071f8f8
    10873: T8428   0x00007ff7218bcd2c  5e                   pop    %rsi
    10874: T8428     read  8 byte(s) @ 0x638071f900
    10875: T8428   0x00007ff7218bcd2d  c3                   ret
    10876: T8428   0x00007ff7218e0de0  c7 05 46 c5 02 00 00 movl   $0x00000000, <rel> 0x00007ff72190d330
    10876: T8428                       00 00 00
    10877: T8428     write 4 byte(s) @ 0x7ff72190d330
    10878: T8428   0x00007ff7218e0dea  83 7c 24 70 fe       cmp    0x70(%rsp), $0xfe
    10879: T8428     read  4 byte(s) @ 0x638071f980
    10880: T8428   0x00007ff7218e0def  75 26                jnz    $0x00007ff7218e0e17
    10881: T8428   0x00007ff7218e0e17  83 7c 24 70 fd       cmp    0x70(%rsp), $0xfd
    10882: T8428     read  4 byte(s) @ 0x638071f980
    10883: T8428   0x00007ff7218e0e1c  75 26                jnz    $0x00007ff7218e0e44
    10884: T8428   0x00007ff7218e0e1e  c7 05 08 c5 02 00 01 movl   $0x00000001, <rel> 0x00007ff72190d330
    10884: T8428                       00 00 00
    10885: T8428     write 4 byte(s) @ 0x7ff72190d330
View tool results:
             16 : total disassembled instructions


interp: start_pc = 0x00007ff7218bcc6f
  0x00007ff7218bcc6f  f3 a4                rep movs %ds:(%rsi)[1byte] %rsi %rdi %rcx -> %es:(%rdi)[1byte] %rsi %rdi %rcx
  0x00007ff7218bcc71  e9 ac 00 00 00       jmp    $0x00007ff7218bcd22
end_pc = 0x00007ff7218bcc76


interp: start_pc = 0x00007ff7218bcc71
  0x00007ff7218bcc71  e9 ac 00 00 00       jmp    $0x00007ff7218bcd22
end_pc = 0x00007ff7218bcc76

interp: start_pc = 0x00007ff7218bcd22
  0x00007ff7218bcd22  48 8b 44 24 40       mov    0x40(%rsp)[8byte] -> %rax
  0x00007ff7218bcd27  48 83 c4 28          add    $0x0000000000000028 %rsp -> %rsp
  0x00007ff7218bcd2b  5f                   pop    %rsp (%rsp)[8byte] -> %rdi %rsp
  0x00007ff7218bcd2c  5e                   pop    %rsp (%rsp)[8byte] -> %rsi %rsp
  0x00007ff7218bcd2d  c3                   ret    %rsp (%rsp)[8byte] -> %rsp
end_pc = 0x00007ff7218bcd2e



after instrumentation:
TAG  0x00007ff7218bcc71
 +0    m4 @0x000001f0800703c0  65 48 89 0c 25 e8 15 mov    %rcx -> %gs:0x000015e8[8byte]
                               00 00
 +9    m4 @0x000001f080074a30  48 8b 0d d1 47 ad 41 mov    <rel> 0x00007ff6c19d28e8[8byte] -> %rcx
 +16   m4 @0x000001f080073438                       <label>
 +16   m4 @0x000001f08006ebe0  e3 fe                jrcxz  @0x000001f08007f0f0[8byte] %rcx
 +18   m4 @0x000001f0800708f0  e9 fb ff ff ff       jmp    @0x000001f0800743a8[8byte]
 +23   m4 @0x000001f08007f0f0                       <label>
 +23   m4 @0x000001f0800717e8  e9 49 00 00 00       jmp    @0x000001f0800705d8[8byte]

==================================
 +28   m4 @0x000001f0800743a8                       <label>
 +28   m4 @0x000001f080075bc0  65 48 8b 0c 25 e8 15 mov    %gs:0x000015e8[8byte] -> %rcx
                               00 00
 +37   m4 @0x000001f08007b088                       <label>
 +37   m4 @0x000001f0800774d8  65 48 a3 00 16 00 00 mov    %rax -> %gs:0x00001600[8byte]
                               00 00 00 00
 +48   m4 @0x000001f08006faf8  9f                   lahf    -> %ah
 +49   m4 @0x000001f080074528  0f 90 c0             seto    -> %al
 +52   m4 @0x000001f08007f290  48 83 05 d8 47 ad 41 add    $0x0000000000000001 <rel> 0x00007ff6c19d28f0[8byte] -> <rel> 0x0
0007ff6c19d28f0[8byte]
                               01
 +60   m4 @0x000001f080073ed8  48 81 3d d5 47 ad 41 cmp    <rel> 0x00007ff6c19d28f0[8byte] $0x0000000000008c00
                               00 8c 00 00
 +71   m4 @0x000001f080071178  0f 8c fa ff ff ff    jl     @0x000001f08007ae18[8byte]
 +77   m4 @0x000001f080073a88  3c 81                cmp    %al $0x81
 +79   m4 @0x000001f08006f990  9e                   sahf   %ah
 +80   m4 @0x000001f08007d368  65 48 a3 08 16 00 00 mov    %rax -> %gs:0x00001608[8byte]
                               00 00 00 00
 +91   m4 @0x000001f08006ef18  65 48 a1 00 16 00 00 mov    %gs:0x00001600[8byte] -> %rax
                               00 00 00 00
 +102  m4 @0x000001f080072b18  65 48 a3 20 16 00 00 mov    %rax -> %gs:0x00001620[8byte]
                               00 00 00 00
 +113  m4 @0x000001f080071548  65 48 a1 40 16 00 00 mov    %gs:0x00001640[8byte] -> %rax
                               00 00 00 00
 +124  m4 @0x000001f080075c28  48 89 60 18          mov    %rsp -> 0x18(%rax)[8byte]
 +128  m4 @0x000001f080070540  48 8b a0 c8 02 00 00 mov    0x000002c8(%rax)[8byte] -> %rsp
 +135  m4 @0x000001f080070758  65 48 a1 20 16 00 00 mov    %gs:0x00001620[8byte] -> %rax
                               00 00 00 00
 +146  m4 @0x000001f08006ed98  48 8d a4 24 78 fd ff lea    0xfffffd78(%rsp) -> %rsp
                               ff
 +154  m4 @0x000001f080077ed8  e8 ab 1f a3 21       call   $0x00007ff6a19300c0 %rsp -> %rsp 0xfffffff8(%rsp)[8byte]
 +159  m4 @0x000001f0800712e0  48 8d 64 24 e0       lea    0xffffffe0(%rsp) -> %rsp
 +164  m4 @0x000001f08007f158                       <label>
 +164  m4 @0x000001f080077b58  48 b9 71 cc 8b 21 f7 mov    $0x00007ff7218bcc71 -> %rcx
                               7f 00 00
 +174  m4 @0x000001f080077c58  e8 db 03 a3 41       call   $0x00007ff6c192e4f0 %rsp -> %rsp 0xfffffff8(%rsp)[8byte]
 +179  m4 @0x000001f080074058  48 8d 64 24 20       lea    0x20(%rsp) -> %rsp
 +184  m4 @0x000001f08006f180  e8 eb 20 a3 21       call   $0x00007ff6a1930200 %rsp -> %rsp 0xfffffff8(%rsp)[8byte]
 +189  m4 @0x000001f080073b08  65 48 a3 20 16 00 00 mov    %rax -> %gs:0x00001620[8byte]
                               00 00 00 00
 +200  m4 @0x000001f080077958  65 48 a1 40 16 00 00 mov    %gs:0x00001640[8byte] -> %rax
                               00 00 00 00
 +211  m4 @0x000001f080075a70  48 8b 60 18          mov    0x18(%rax)[8byte] -> %rsp
 +215  m4 @0x000001f08006f840  65 48 a1 20 16 00 00 mov    %gs:0x00001620[8byte] -> %rax
                               00 00 00 00
 +226  m4 @0x000001f0800741d8                       <label>
 +226  m4 @0x000001f080071448                       <label>
 +226  m4 @0x000001f080072998  65 48 a1 08 16 00 00 mov    %gs:0x00001608[8byte] -> %rax
                               00 00 00 00
 +237  m4 @0x000001f08007ae18                       <label>
 +237  m4 @0x000001f080074158  3c 81                cmp    %al $0x81
 +239  m4 @0x000001f080073fd8  9e                   sahf   %ah
 +240  m4 @0x000001f080072b98  65 48 a1 00 16 00 00 mov    %gs:0x00001600[8byte] -> %rax
                               00 00 00 00
 +251  m4 @0x000001f08006fc78  e9 4a 00 00 00       jmp    @0x000001f08007a110[8byte]
 +256  m4 @0x000001f080074678                       <label>

==================================================
 +256  m4 @0x000001f0800705d8                       <label>
 +256  m4 @0x000001f080073988  65 48 8b 0c 25 e8 15 mov    %gs:0x000015e8[8byte] -> %rcx
                               00 00
 +265  m4 @0x000001f080076900  65 48 89 0c 25 00 16 mov    %rcx -> %gs:0x00001600[8byte]
                               00 00
 +274  m4 @0x000001f0800702a8  65 48 8b 0c 25 b0 15 mov    %gs:0x000015b0[8byte] -> %rcx
                               00 00
 +283  m4 @0x000001f080070340  65 48 a3 08 16 00 00 mov    %rax -> %gs:0x00001608[8byte]
                               00 00 00 00
 +294  m4 @0x000001f080079c30  65 48 89 14 25 10 16 mov    %rdx -> %gs:0x00001610[8byte]
                               00 00
 +303  m4 @0x000001f0800706d8  48 8b 15 b1 45 ad 41 mov    <rel> 0x00007ff6c19d26c8[8byte] -> %rdx
 +310  m4 @0x000001f08006f740  65 48 a1 d8 15 00 00 mov    %gs:0x000015d8[8byte] -> %rax
                               00 00 00 00
 +321  m4 @0x000001f080074f80  48 f7 d2             not    %rdx -> %rdx
 +324  m4 @0x000001f080077cd8  48 8d 52 01          lea    0x01(%rdx) -> %rdx
 +328  m4 @0x000001f08006f6c0  48 8d 04 10          lea    (%rax,%rdx) -> %rax
 +332  m4 @0x000001f080074340  48 8b 11             mov    (%rcx)[8byte] -> %rdx
 +335  m4 @0x000001f08007f2f8  48 8d 04 10          lea    (%rax,%rdx) -> %rax
 +339  m4 @0x000001f0800727e0  48 89 01             mov    %rax -> (%rcx)[8byte]
 +342  m4 @0x000001f080070010  48 8b 09             mov    (%rcx)[8byte] -> %rcx
 +345  m4 @0x000001f080070870  e3 fe                jrcxz  @0x000001f080072848[8byte] %rcx
 +347  m4 @0x000001f0800704c0  65 48 a3 20 16 00 00 mov    %rax -> %gs:0x00001620[8byte]
                               00 00 00 00
 +358  m4 @0x000001f080070970  65 48 a1 40 16 00 00 mov    %gs:0x00001640[8byte] -> %rax
                               00 00 00 00
 +369  m4 @0x000001f08006fbf8  48 89 60 18          mov    %rsp -> 0x18(%rax)[8byte]
 +373  m4 @0x000001f08006ee00  48 8b a0 c8 02 00 00 mov    0x000002c8(%rax)[8byte] -> %rsp
 +380  m4 @0x000001f08006f578  65 48 a1 20 16 00 00 mov    %gs:0x00001620[8byte] -> %rax
                               00 00 00 00
 +391  m4 @0x000001f08006fd78  48 8d a4 24 78 fd ff lea    0xfffffd78(%rsp) -> %rsp
                               ff
 +399  m4 @0x000001f08007b228  e8 ab 1f a3 21       call   $0x00007ff6a19300c0 %rsp -> %rsp 0xfffffff8(%rsp)[8byte]
 +404  m4 @0x000001f080072a98  48 8d 64 24 e0       lea    0xffffffe0(%rsp) -> %rsp
 +409  m4 @0x000001f08006fa78                       <label>
 +409  m4 @0x000001f080075708  e8 7b a5 a2 41       call   $0x00007ff6c1928690 %rsp -> %rsp 0xfffffff8(%rsp)[8byte]
 +414  m4 @0x000001f0800709f0  48 8d 64 24 20       lea    0x20(%rsp) -> %rsp
 +419  m4 @0x000001f080075100  e8 eb 20 a3 21       call   $0x00007ff6a1930200 %rsp -> %rsp 0xfffffff8(%rsp)[8byte]
 +424  m4 @0x000001f080079d68  65 48 a3 20 16 00 00 mov    %rax -> %gs:0x00001620[8byte]
                               00 00 00 00
 +435  m4 @0x000001f080070090  65 48 a1 40 16 00 00 mov    %gs:0x00001640[8byte] -> %rax
                               00 00 00 00
 +446  m4 @0x000001f080070128  48 8b 60 18          mov    0x18(%rax)[8byte] -> %rsp
 +450  m4 @0x000001f08006fe78  65 48 a1 20 16 00 00 mov    %gs:0x00001620[8byte] -> %rax
                               00 00 00 00
 +461  m4 @0x000001f0800714c8                       <label>
 +461  m4 @0x000001f08006ec60                       <label>
 +461  m4 @0x000001f080072848                       <label>
 +461  m4 @0x000001f080072a18                       <label>
 +461  m4 @0x000001f080071630                       <label>
 +461  m4 @0x000001f080074240  65 48 a1 08 16 00 00 mov    %gs:0x00001608[8byte] -> %rax
                               00 00 00 00
 +472  m4 @0x000001f08007f360  65 48 8b 0c 25 00 16 mov    %gs:0x00001600[8byte] -> %rcx
                               00 00
 +481  m4 @0x000001f08006ecc8  65 48 8b 14 25 10 16 mov    %gs:0x00001610[8byte] -> %rdx
                               00 00
 +490  m4 @0x000001f08007a110                       <label>
 +490  L3 @0x000001f08006eaf8  e9 ac 00 00 00       jmp    $0x00007ff7218bcd22
END 0x00007ff7218bcc71

I can repro in an asm app on linux.

Using "instr" instead of drmgr_orig_app_instr_for_*: bug goes away.

TAG  0x0000000000000000
 +0    m4 @0x00007f4b3e2c9c20                       <label note=0x000000000000004e>
 +0    L3 @0x00007f4b3e2c99a0  48 83 e4 f0          and    $0xfffffffffffffff0 %rsp -> %rsp
 +4    m4 @0x00007f4b3e2ca4b0                       <label note=0x0000000000000001>
 +4    L4 @0x00007f4b3e2c9ba0  e9 4a 00 00 00       jmp    @0x00007f4b3e2c9b20[8byte]
 +9    m4 @0x00007f4b3e2ca600                       <label note=0x0000000000000002>
 +9    m4 @0x00007f4b3e2c9f68                       <label note=0x000000000000004e>
 +9    L3 @0x00007f4b3e2caca0  48 83 e4 f0          and    $0xfffffffffffffff0 %rsp -> %rsp
 +13   m4 @0x00007f4b3e2c9b20                       <label note=0x000000000000004f>
 +13   L3 @0x00007f4b3e2c9920  67 e3 01             addr32 jecxz  $0x0000000000401008 %ecx
END 0x0000000000000000
event_app_instruction: emul fetch=and    $0xfffffffffffffff0 %rsp -> %rsp  op=and    $0xfffffffffffffff0 %rsp -> %rsp  instr=and    $0xfffffffffffffff0 %rsp -> %rsp  where=and    $0xfffffffffffffff0 %rsp -> %rsp
event_app_instruction: emul fetch=<null>  op=<null>  instr=addr32 jecxz  $0x0000000000401008 %ecx  where=addr32 jecxz  $0x0000000000401008 %ecx

TAG  0x0000000000000000
 +0    m4 @0x00007f4b3e2ccdf0                       <label note=0x000000000000004e>
 +0    m4 @0x00007f4b3e2cd7e8                       <label note=0x0000000000000001>
 +0    L4 @0x00007f4b3e2cd8e8  e9 4a 00 00 00       jmp    @0x00007f4b3e2c9920[8byte]
 +5    m4 @0x00007f4b3e2cc7a0                       <label note=0x0000000000000002>
 +5    m4 @0x00007f4b3e2c9b20                       <label note=0x000000000000004e>
 +5    m4 @0x00007f4b3e2c9920                       <label note=0x000000000000004f>
 +5    L3 @0x00007f4b3e2cd968  eb 00                jmp    $0x000000000040100a
END 0x0000000000000000
event_app_instruction: emul fetch=<null>  op=<null>  instr=jmp    $0x000000000040100a  where=jmp    $0x000000000040100a

TAG  0x0000000000000000
 +0    m4 @0x00007f4b3e2c9d98                       <label note=0x000000000000004e>
 +0    L3 @0x00007f4b3e2ca3c8  31 c9                xor    %ecx %ecx -> %ecx
 +2    L3 @0x00007f4b3e2c9fd0  45 31 c0             xor    %r8d %r8d -> %r8d
 +5    L3 @0x00007f4b3e2ca900  66 f2 44 0f 38 f1 c1 data16 crc32  %cx %r8d -> %r8d
 +12   L3 @0x00007f4b3e2ca0d0  48 83 f8 04          cmp    %rax $0x0000000000000004
 +16   m4 @0x00007f4b3e2ca750                       <label note=0x0000000000000001>
 +16   L4 @0x00007f4b3e2cac20  e9 4a 00 00 00       jmp    @0x00007f4b3e2c9a20[8byte]
 +21   m4 @0x00007f4b3e2c9ee8                       <label note=0x0000000000000002>
 +21   m4 @0x00007f4b3e2ca6d0                       <label note=0x000000000000004e>
 +21   L3 @0x00007f4b3e2cd968  31 c9                xor    %ecx %ecx -> %ecx
 +23   L3 @0x00007f4b3e2c9920  45 31 c0             xor    %r8d %r8d -> %r8d
 +26   L3 @0x00007f4b3e2c98b8  66 f2 44 0f 38 f1 c1 data16 crc32  %cx %r8d -> %r8d
 +33   L3 @0x00007f4b3e2c9e00  48 83 f8 04          cmp    %rax $0x0000000000000004
 +37   m4 @0x00007f4b3e2c9a20                       <label note=0x000000000000004f>
 +37   L3 @0x00007f4b3e2c9aa0  eb 00                jmp    $0x000000000040101c
END 0x0000000000000000
event_app_instruction: emul fetch=xor    %ecx %ecx -> %ecx  op=xor    %ecx %ecx -> %ecx  instr=xor    %ecx %ecx -> %ecx  where=xor    %ecx %ecx -> %ecx
event_app_instruction: emul fetch=xor    %r8d %r8d -> %r8d  op=xor    %r8d %r8d -> %r8d  instr=xor    %r8d %r8d -> %r8d  where=xor    %r8d %r8d -> %r8d
event_app_instruction: emul fetch=data16 crc32  %cx %r8d -> %r8d  op=data16 crc32  %cx %r8d -> %r8d  instr=data16 crc32  %cx %r8d -> %r8d  where=data16 crc32  %cx %r8d -> %r8d
event_app_instruction: emul fetch=cmp    %rax $0x0000000000000004  op=cmp    %rax $0x0000000000000004  instr=cmp    %rax $0x0000000000000004  where=cmp    %rax $0x0000000000000004
event_app_instruction: emul fetch=<null>  op=<null>  instr=jmp    $0x000000000040101c  where=jmp    $0x000000000040101c

Looks like drbbdup calls the insert cb while at the DRBBDUP_LABEL_EXIT
(can't insert instru after that b/c the final special instr is shared
among all cases):

TAG  0x0000000000000000
 +0    m4 @0x00007f1d2d6b0c20                       <label note=0x000000000000004e>
 +0    L3 @0x00007f1d2d6b09a0  48 83 e4 f0          and    $0xfffffffffffffff0 %rsp -> %rsp
 +4    m4 @0x00007f1d2d6b14b0                       <label note=0x0000000000000001>
 +4    L4 @0x00007f1d2d6b0ba0  e9 4a 00 00 00       jmp    @0x00007f1d2d6b0b20[8byte]
 +9    m4 @0x00007f1d2d6b1600                       <label note=0x0000000000000002>
 +9    m4 @0x00007f1d2d6b0f68                       <label note=0x000000000000004e>
 +9    L3 @0x00007f1d2d6b1ca0  48 83 e4 f0          and    $0xfffffffffffffff0 %rsp -> %rsp
 +13   m4 @0x00007f1d2d6b0b20                       <label note=0x000000000000004f>
 +13   L3 @0x00007f1d2d6b0920  67 e3 01             addr32 jecxz  $0x0000000000401008 %ecx
END 0x0000000000000000

drmgr_bb_event_do_instrum_phases cur=<label note=0x000000000000004e>
drmgr_bb_event_do_instrum_phases cur=and    $0xfffffffffffffff0 %rsp -> %rsp
drmgr_bb_event_do_instrum_phases cur=<label note=0x0000000000000001>
drmgr_bb_event_do_instrum_phases cur=jmp    @0x00007f1d2d6b0b20[8byte]
drmgr_bb_event_do_instrum_phases cur=<label note=0x0000000000000002>
drmgr_bb_event_do_instrum_phases cur=<label note=0x000000000000004e>
drmgr_bb_event_do_instrum_phases cur=and    $0xfffffffffffffff0 %rsp -> %rsp
(pt->insertion_instr)event_app_instruction: emul fetch=and    $0xfffffffffffffff0 %rsp -> %rsp  op=and    $0xfffffffffffffff0 %rsp -> %rsp  instr=and    $0xfffffffffffffff0 %rsp -> %rsp  where=and    $0xfffffffffffffff0 %rsp -> %rsp
drmgr_bb_event_do_instrum_phases cur=<label note=0x000000000000004f>
  not app=<label note=0x000000000000004f>event_app_instruction: emul fetch=<null>  op=<null>  instr=addr32 jecxz  $0x0000000000401008 %ecx  where=addr32 jecxz  $0x0000000000401008 %ecx
drmgr_bb_event_do_instrum_phases cur=addr32 jecxz  $0x0000000000401008 %ecx

But, putting emul start+end labels around the exit label doesn't work b/c
drmgr is at the exit label when drbbdup calls the insert cb: and that's one
past the start label which is the only one for which drmgr returns the
emulated instr (b/c of the DR_EMULATE_IS_FIRST_INSTR flag).

So we either need drbbdup_is_at_end() to return true for the emul start
label before the exit label (and fix up all the logic depending on
drbbdup_is_at_end) or find some other solution.

derekbruening added a commit that referenced this issue Mar 8, 2022
Adds new options -trace_for_instrs and -retrace_every_instrs to
drcachesim for periodic trace bursts of an unmodified application.
Implements them by adapting the existing drbbdup cases for switching
between -trace_after_instrs and full tracing.

Adds documentation on the new options.

Adds instru_t::get_instr_count to count instuctions while tracing, to
know when a tracing burst window is finished.  Uses a local counter
only added to the global every 10K instructions to avoid
synchronization costs.

Adds a new marker with the ordinal of the trace window.  This marker
is added to each buffer header.  This, combined with a new check for
the window having changed to ensure a buffer dump at the end of each
block, limits the possible window drift to one block's worth of data.

Augments raw2trace to avoid delaying a branch across a window change.

Augments the view tool to mark window changes and delay timestamp
output to group with the proper window (it is difficult to actually
reorder timestamp and window entries).

Augments the basic_counts tool to track and display per-window global
statistics.

Augments the invariant_checker tool to not complain on a control-flow
gap across a window.  Adds a test of this: but disables it for Windows
temporarily due to more emulation interopability issues which #5390
covers.

Adds a simple online test and a simple offline test that just confirm
multiple windows are hit on simple_app.  Adds an assembly test with
precise values for the windows.

Issue: #3995, #5390
derekbruening added a commit that referenced this issue Mar 8, 2022
…st missing marker after timestamp; disable invar test for this PR b/c even on linux it sometimes hits the #5390 missing jump bug
derekbruening added a commit that referenced this issue Mar 9, 2022
Adds new options -trace_for_instrs and -retrace_every_instrs to
drcachesim for periodic trace bursts of an unmodified application.
Implements them by adapting the existing drbbdup cases for switching
between -trace_after_instrs and full tracing.

Adds documentation on the new options.

Adds instru_t::get_instr_count to count instuctions while tracing, to
know when a tracing burst window is finished.  Uses a local counter
only added to the global every 10K instructions to avoid
synchronization costs.

Adds a new marker with the ordinal of the trace window.  This marker
is added to each buffer header.  This, combined with a new check for
the window having changed to ensure a buffer dump at the end of each
block, limits the possible window drift to one block's worth of data.

Augments raw2trace to avoid delaying a branch across a window change.

Augments the view tool to mark window changes and delay timestamp
output to group with the proper window (it is difficult to actually
reorder timestamp and window entries).

Augments the basic_counts tool to track and display per-window global
statistics.

Augments the invariant_checker tool to not complain on a control-flow
gap across a window.  Adds a test of this: but disables it for Windows
temporarily due to more emulation interopability issues which #5390
covers.

Adds a simple online test and a simple offline test that just confirm
multiple windows are hit on simple_app.  Adds an assembly test with
precise values for the windows.

Issue: #3995, #5390
derekbruening added a commit that referenced this issue Mar 9, 2022
…st missing marker after timestamp; disable invar test for this PR b/c even on linux it sometimes hits the #5390 missing jump bug
derekbruening added a commit that referenced this issue Mar 9, 2022
Fixes several shortcomings in the initial attempt to use emulation
markers to support the drmgr emulation API in drbbdup, and fixes
related issues in drmemtrace when it uses drbbdup.

Adds missing emulation markers for a special instr for the last bbdup case (previously only the earlier cases were marked).

Removes emulation markers from the analysis copy in
drbbdup_extract_bb_copy() (otherwise drmemtrace sees them and
incorrectly disables elision).

Fixes the drmemtrace check for elision labels to use "where" except
when "app" is actually an exclusive store, to properly find the labels
and elide.

Enables the tools.drcacheoff.windows-invar test which now passes on
all platforms.

Updates the drbbdup-emul-test to cover the drbbdup changes.

Fixes #5390
derekbruening added a commit that referenced this issue Mar 9, 2022
Adds new options -trace_for_instrs and -retrace_every_instrs to
drcachesim for periodic trace bursts of an unmodified application.
Implements them by adapting the existing drbbdup cases for switching
between -trace_after_instrs and full tracing.

Adds documentation on the new options.

Adds instru_t::get_instr_count to count instuctions while tracing, to
know when a tracing burst window is finished.  Uses a local counter
only added to the global every 10K instructions to avoid
synchronization costs.

Adds a new marker with the ordinal of the trace window.  This marker
is added to each buffer header.  This, combined with a new check for
the window having changed to ensure a buffer dump at the end of each
block, limits the possible window drift to one block's worth of data.

Augments raw2trace to avoid delaying a branch across a window change.

Augments the view tool to mark window changes and delay timestamp
output to group with the proper window (it is difficult to actually
reorder timestamp and window entries).

Augments the basic_counts tool to track and display per-window global
statistics.

Augments the invariant_checker tool to not complain on a control-flow
gap across a window.  Adds a test of this: but disables it for Windows
temporarily due to more emulation interopability issues which #5390
covers.

Adds a simple online test and a simple offline test that just confirm
multiple windows are hit on simple_app.  Adds an assembly test with
precise values for the windows.

Issue: #3995, #5390
derekbruening added a commit that referenced this issue Mar 9, 2022
…st missing marker after timestamp; disable invar test for this PR b/c even on linux it sometimes hits the #5390 missing jump bug
derekbruening added a commit that referenced this issue Mar 9, 2022
Fixes several shortcomings in the initial attempt to use emulation
markers to support the drmgr emulation API in drbbdup, and fixes
related issues in drmemtrace when it uses drbbdup.

Adds missing emulation markers for a special instr for the last bbdup case (previously only the earlier cases were marked).

Removes emulation markers from the analysis copy in
drbbdup_extract_bb_copy() (otherwise drmemtrace sees them and
incorrectly disables elision).

Fixes the drmemtrace check for elision labels to use "where" except
when "app" is actually an exclusive store, to properly find the labels
and elide.

Enables the tools.drcacheoff.windows-invar test which now passes on
all platforms.

Updates the drbbdup-emul-test to cover the drbbdup changes.

Fixes #5390
@derekbruening
Copy link
Contributor Author

I keep hitting more cases where drbbdup does not interoperate with drmemtrace, and they are not easy to debug. Adding the emulation markers for the final case caused a number of problems with drmemtrace elision. Finally tracked down all the problems. This use of drbbdup by drmemtrace has been a lot more work than anticipated.

derekbruening added a commit that referenced this issue Mar 9, 2022
Fixes several shortcomings in the initial attempt to use emulation
markers to support the drmgr emulation API in drbbdup, and fixes
related issues in drmemtrace when it uses drbbdup.

Adds missing emulation markers for a special instr for the last bbdup case (previously only the earlier cases were marked).

Removes emulation markers from the analysis copy in
drbbdup_extract_bb_copy() (otherwise drmemtrace sees them and
incorrectly disables elision).

Fixes the drmemtrace check for elision labels to use "where" except
when "app" is actually an exclusive store, to properly find the labels
and elide.

Enables the tools.drcacheoff.windows-invar test which now passes on
all platforms.

Updates the drbbdup-emul-test to cover the drbbdup changes.

Fixes #5390
derekbruening added a commit that referenced this issue Mar 9, 2022
Adds new options -trace_for_instrs and -retrace_every_instrs to
drcachesim for periodic trace bursts of an unmodified application.
Implements them by adapting the existing drbbdup cases for switching
between -trace_after_instrs and full tracing.

Adds documentation on the new options.

Adds instru_t::get_instr_count to count instuctions while tracing, to
know when a tracing burst window is finished.  Uses a local counter
only added to the global every 10K instructions to avoid
synchronization costs.

Adds a new marker with the ordinal of the trace window.  This marker
is added to each buffer header.  This, combined with a new check for
the window having changed to ensure a buffer dump at the end of each
block, limits the possible window drift to one block's worth of data.

Augments raw2trace to avoid delaying a branch across a window change.

Augments the view tool to mark window changes and delay timestamp
output to group with the proper window (it is difficult to actually
reorder timestamp and window entries).

Augments the basic_counts tool to track and display per-window global
statistics.

Augments the invariant_checker tool to not complain on a control-flow
gap across a window.  Adds a test of this: but disables it
temporarily due to more emulation interopability issues which #5390
covers.

Adds a simple online test and a simple offline test that just confirm
multiple windows are hit on simple_app.  Adds an assembly test with
precise values for the windows.

Issue: #3995, #5390
derekbruening added a commit that referenced this issue Mar 10, 2022
Fixes several shortcomings in the initial attempt to use emulation
markers to support the drmgr emulation API in drbbdup, and fixes
related issues in drmemtrace when it uses drbbdup.

Adds missing emulation markers for a special instr for the last bbdup case (previously only the earlier cases were marked).

Removes emulation markers from the analysis copy in
drbbdup_extract_bb_copy() (otherwise drmemtrace sees them and
incorrectly disables elision).

Fixes the drmemtrace check for elision labels to use "where" except
when "app" is actually an exclusive store, to properly find the labels
and elide.

Enables the tools.drcacheoff.windows-invar test which now passes on
all platforms.

Updates the drbbdup-emul-test to cover the drbbdup changes.

Fixes #5390
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment