Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drcachesim's ifetch results are inflated from rep string expansion #2011

Closed
derekbruening opened this issue Oct 5, 2016 · 3 comments
Closed

Comments

@derekbruening
Copy link
Contributor

Working on #1729 I saw that offline had fewer instruction fetches than online:

# bin64/drrun -t drcachesim -- suite/tests/bin/simple_app
---- <application exited with code 0> ----
Core #0 (1 thread(s))
  L1I stats:
    Hits:                          163,521
    Misses:                            793
    Miss rate:                        0.48%
  L1D stats:
    Hits:                           55,281
    Misses:                          2,098
    Miss rate:                        3.66%
Core #1 (0 thread(s))
Core #2 (0 thread(s))
Core #3 (0 thread(s))
LL stats:
    Hits:                              334
    Misses:                          2,557
    Local miss rate:                 88.45%
    Child hits:                    218,802
    Total miss rate:                  1.15%
# bin64/drrun -t drcachesim -offline -- suite/tests/bin/simple_app
# clients/bin64/drcachesim -indir drmemtrace.simple_app.32731.0000.dir/
Core #0 (1 thread(s))
  L1I stats:
    Hits:                          146,282
    Misses:                            793
    Miss rate:                        0.54%
  L1D stats:
    Hits:                           55,282
    Misses:                          2,095
    Miss rate:                        3.65%
Core #1 (0 thread(s))
Core #2 (0 thread(s))
Core #3 (0 thread(s))
LL stats:
    Hits:                              329
    Misses:                          2,559
    Local miss rate:                 88.61%
    Child hits:                    201,564
    Total miss rate:                  1.25%

That's a significant difference.

Online, disable bundles:
 L1I stats:
    Hits:                          160,887
    Misses:                            793

Tiny app is the same for both:

# bin64/drrun -t drcachesim -- /work/dr/test/allasm_x64
---- <application exited with code 0> ----
Core #0 (1 thread(s))
  L1I stats:
    Hits:                               39
    Misses:                              2
    Miss rate:                        4.88%
# bin64/drrun -t drcachesim -offline -- /work/dr/test/allasm_x64
# bin64/drrun -t drcachesim -indir drmemtrace.allasm_x64.10981.0000.dir/
Core #0 (1 thread(s))
  L1I stats:
    Hits:                               39
    Misses:                              2
    Miss rate:                        4.88%

Diffing simple_app, online has many more *4bb is the difference.

offline:
Instr 0x00007f82984464b0
Instr 0x00007f82984464b3
Instr 0x00007f82984464b7
Instr 0x00007f82984464ba
Instr 0x00007f82984464ba
Instr 0x00007f82984464ba

online:
Instr 0x00007fd6ea0fb4b0
Instr 0x00007fd6ea0fb4b3
Instr 0x00007fd6ea0fb4b7
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4ba
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4ba
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4bb
Instr 0x00007fd6ea0fb4ba

Online has 4 or 5 *4bb before every *4ba.

It's rep stos:

interp: start_pc = 0x00007fd6ea0fb4ba
  0x00007fd6ea0fb4ba  f3 aa                rep stos %al %rdi %rcx -> %es:(%rdi)[1byte] %rdi %rcx
  0x00007fd6ea0fb4bc  48 89 d0             mov    %rdx -> %rax

drutil uses 2 different translations: everything's xl8 is +1 except the stos or movs itself:

         *    rep movs
         * =>
         *    jecxz  zero
         *    jmp    iter
         *  zero:
         *    mov    $0x00000001 -> %ecx
         *    jmp    pre_loop
         *  iter:
         *    movs   %ds:(%esi) %esi %edi -> %es:(%edi) %esi %edi
         *  pre_loop:
         *    loop

Online is counting all those expanded instrs. I explicitly have code to not count them for offline. I'll address this by trying to do something similar for online.

@derekbruening
Copy link
Contributor Author

There's a bug here: a zero-iter rep loop ends up without any instruction fetch at all.

@derekbruening derekbruening reopened this Dec 5, 2017
@derekbruening
Copy link
Contributor Author

Here's an example, found by a new trace invariants test for #2638 and #2708:

::13402.13402::  @0xf773a6b5 instr x1
::13402.13402::  @0xf773a6b6 instr x2
::13402.13402::  @0xf773a6b8 instr x2
::13402.13402::  @0xf773a6ba instr x2
::13402.13402::  @0xf773a6be instr x1
drmemtrace_histogram: /home/bruening/dr/git/src/clients/drcachesim/tests/trace_invariants.cpp:77: virtual bool trace_invariants_t::process_memref(const memref_t&): Assertion `(prev_instr.instr.addr + prev_instr.instr.size == memref.instr.addr) || (prev_instr.instr.addr == memref.instr.addr && memref.instr.type == TRACE_TYPE_INSTR_NO_FETCH) || (prev_marker.instr.tid == memref.instr.tid && (prev_marker.marker.marker_type == TRACE_MARKER_TYPE_KERNEL_EVENT || prev_marker.marker.marker_type == TRACE_MARKER_TYPE_KERNEL_XFER))' failed.
interp: start_pc = 0xf774c6ba
  0xf774c6ba  66 a5                data16 movs   %ds:(%esi)[2byte] %esi %edi -> %es:(%edi)[2byte] %esi %edi
  0xf774c6bc  f3 a5                rep movs %ds:(%esi)[4byte] %esi %edi %ecx -> %es:(%edi)[4byte] %esi %edi %ecx
  0xf774c6be  97                   xchg   %edi %eax -> %edi %eax
  0xf774c6bf  89 d6                mov    %edx -> %esi
  0xf774c6c1  c3                   ret    %esp (%esp)[4byte] -> %esp
end_pc = 0xf774c6c2
[drmemtrace]: Appending 1 instrs in bb 0xf70146ba in mod 4 +0x186ba = /lib/i386-linux-gnu/ld-2.19.so
 0xf774c6ba data16 movs   %ds:(%esi)[2byte] %esi %edi -> %es:(%edi)[2byte] %esi %edi
[drmemtrace]: Appended memref to 0xf774e018
[drmemtrace]: Appended memref to 0xf7755f29
[drmemtrace]: About to read thread 13701 at pos 28080
[drmemtrace]: Appending 3 instrs in bb 0xf70146be in mod 4 +0x186be = /lib/i386-linux-gnu/ld-2.19.so
 0xf774c6be xchg   %edi %eax -> %edi %eax
 0xf774c6bf mov    %edx -> %esi
 0xf774c6c1 ret    %esp (%esp)[4byte] -> %esp

<...>

[drmemtrace]: Appending 1 instrs in bb 0xf70146ba in mod 4 +0x186ba = /lib/i386-linux-gnu/ld-2.19.so
 0xf774c6ba data16 movs   %ds:(%esi)[2byte] %esi %edi -> %es:(%edi)[2byte] %esi %edi
[drmemtrace]: Appended memref to 0xef3adb08
[drmemtrace]: Appended memref to 0xef3adda0
[drmemtrace]: About to read thread 13701 at pos 100752
[drmemtrace]: Appending 1 instrs in bb 0xf70146bc in mod 4 +0x186bc = /lib/i386-linux-gnu/ld-2.19.so
 0xf774c6bc rep movs %ds:(%esi)[4byte] %esi %edi %ecx -> %es:(%edi)[4byte] %esi %edi %ecx
[drmemtrace]: Appended memref to 0xef3adb0a
[drmemtrace]: Appended memref to 0xef3adda2
[drmemtrace]: About to read thread 13701 at pos 100776
[drmemtrace]: Appending 1 instrs in bb 0xf70146bc in mod 4 +0x186bc = /lib/i386-linux-gnu/ld-2.19.so
 0xf774c6bc rep movs %ds:(%esi)[4byte] %esi %edi %ecx -> %es:(%edi)[4byte] %esi %edi %ecx
<...>
 0xf774c6bc rep movs %ds:(%esi)[4byte] %esi %edi %ecx -> %es:(%edi)[4byte] %esi %edi %ecx
[drmemtrace]: Skipping instr fetch for 0xf70146bc
[drmemtrace]: Appended memref to 0xef3adb22
[drmemtrace]: Appended memref to 0xef3addba
[drmemtrace]: About to read thread 13701 at pos 100920
[drmemtrace]: Appending 3 instrs in bb 0xf70146be in mod 4 +0x186be = /lib/i386-linux-gnu/ld-2.19.so
 0xf774c6be xchg   %edi %eax -> %edi %eax
 0xf774c6bf mov    %edx -> %esi
 0xf774c6c1 ret    %esp (%esp)[4byte] -> %esp

#2011 is what removed the ifetches. Should we add an instr entry after "zero:"?

         *    rep movs
         * =>
         *    jecxz  zero
         *    jmp    iter
         *  zero:
         *    mov    $0x00000001 -> %ecx
         *    jmp    pre_loop
         *  iter:
         *    movs   %ds:(%esi) %esi %edi -> %es:(%edi) %esi %edi
         *  pre_loop:
         *    loop

derekbruening added a commit that referenced this issue Dec 6, 2017
Adds a missing instruction fetch for zero-iteration rep string loops by
moving the instruction instrumentation to the top of a rep string bb rather
than prior to the single memory reference.

The forthcoming trace_invariants continuous PC checks for #2708 will test
this.

Fixes #2011
@derekbruening
Copy link
Contributor Author

Better to just move the current instru to the top of the bb.

derekbruening added a commit that referenced this issue Dec 6, 2017
Adds a missing instruction fetch for zero-iteration rep string loops by
moving the instruction instrumentation to the top of a rep string bb rather
than prior to the single memory reference.

The forthcoming trace_invariants continuous PC checks for #2708 will add
an automated test for this.

Fixes #2011
derekbruening added a commit that referenced this issue Dec 7, 2017
Extends the trace_invariants test from #2638 to ensure there's no
discontinuity in control flow not indicated by a branch or a kernel xfer
marker.

Adds an online trace_invariants test.

Switches the trace_invariants test to run pthreads.ptsig on UNIX, which is
marked un-FLAKY, to test both threads and signals, and on winxfer on
Windows.

Fixes several issues found by this new test:
+ Adds TRACE_TYPE_INSTR_SYSENTER to mark the PC discontinuity from
  OP_sysenter.
+ Adds proper handling of a mid-bb fault in offline trace conversion by
  looking ahead after each memref to see whether there's a marker.
+ #2011 follow-up: fixes the zero-iter code from 2772b0b which it turns out
  only worked for offline traces.  For online, the top-of-bb instr is
  jecxz, so the instr type and size were wrong.
+ Fixes pipe write splitting to avoid separating an instr from its bundle
  entries.
+ Avoids a marker for a thread init kernel xfer event on Windows.

Fixes #2708
derekbruening added a commit that referenced this issue Dec 8, 2017
Extends the trace_invariants test from #2638 to ensure there's no
discontinuity in control flow not indicated by a branch or a kernel xfer
marker.

Adds an online trace_invariants test.

Switches the trace_invariants test to run pthreads.ptsig on UNIX, which is
marked un-FLAKY, to test both threads and signals, and on winxfer on
Windows.

Fixes several issues found by this new test:
+ Adds TRACE_TYPE_INSTR_SYSENTER to mark the PC discontinuity from
  OP_sysenter.
+ Adds proper handling of a mid-bb fault in offline trace conversion by
  looking ahead after each memref to see whether there's a marker.
+ #2011 follow-up: fixes the zero-iter code from 2772b0b which it turns out
  only worked for offline traces.  For online, the top-of-bb instr is
  jecxz, so the instr type and size were wrong.
+ Fixes pipe write splitting to avoid separating an instr from its bundle
  entries.
+ Avoids a marker for a thread init kernel xfer event on Windows.

Fixes #2708
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant