Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add a count for total number of instruction for filtered trace #4857

Closed
vicwlee opened this issue Apr 16, 2021 · 3 comments · Fixed by #4864 or #4961
Closed

Feature Request: Add a count for total number of instruction for filtered trace #4857

vicwlee opened this issue Apr 16, 2021 · 3 comments · Fixed by #4864 or #4961

Comments

@vicwlee
Copy link

vicwlee commented Apr 16, 2021

When traced in filtered mode, the total number of records does not equal to the number of instructions executed during tracing. It would be very useful to have accurate instruction count for filtered trace. At the minimum, it should provide a total number of instructions executed during the tracing period. Additional details such as # of instructions per thread, or # of instructions since last record (i.e. for every filtered miss) would great with the understanding that these will add additional overhead in speed and size.

This feature would be extremely useful to try to correlate filtered trace with unfiltered trace and would enable filtered trace be more useful for cache studies.

@derekbruening
Copy link
Contributor

To add context, this is about drcachesim/drmemtrace traces, not core traces.

@derekbruening
Copy link
Contributor

A prototype (after hitting and fixing #4861) implementing per-miss count markers
and measured on bzip2 with local SSD storage (which with -L0_filter is cpu-bound):

  • Incrementing a counter per instruction is too expensive (1.5x to 1.9x w/o
    any extra i/o), but per block (fixup should be possible in raw2trace) is
    not bad (1.0x to 1.07x).
  • Per-miss i/o is 50% more output and pushes this to be no longer purely
    cpu-bound. i+d is a 1.26x time hit.
  • Jumping to 512K filter caches reduces per-miss i/o to 1.06x time and
    1.08x space: so it does not completely outweigh the overhead of per-miss
    but does shrink it.
  • Squishing the bottom 12 bits of the counter into the existing i/o
    (relying on BMI2) shrinks the 1.26x to 1.05x. But that's w/o overflow
    checks. Not sure this is worth the complexity.

The decision is to abandon per-miss entries and just have a single count marker on thread exit.
The counter is incremented once per block.

derekbruening added a commit that referenced this issue Apr 21, 2021
… or 2 misses; REMINDER: update 1st commit to Fixes #4857
derekbruening added a commit that referenced this issue Apr 21, 2021
Adds tracking of the dynamic (pre-filtered) instruction count for each thread.
Stores the count in a marker at thread exit (storing prior to each miss entry
might be useful but in experiments it adds too much overhead to include
without a clear mandate and possibly under an off-by-default runtime option).

Adds a check to the filter tests confirming that the instruction count
marker is present at thread exit for filtered traces.  This is done by
invoked test_mode for these traces.  To simplify testing for online
and offline, the trace type marker is added to online traces.

Adds a new tool.drcachesim.filter-asm test which ensures the instr
count is the precise count expected for that fully-deterministic
tiny static binary.

A larger-than-static-asm application was manually tested, confirming
the need for the rep string expansion special case in the code to
limit to 1 instruction for that block.

Fixes #4857
@derekbruening
Copy link
Contributor

More frequent count entries in the traces is desirable. Re-opening to add the count to each buffer, which should be very low overhead.

@derekbruening derekbruening reopened this Jun 18, 2021
derekbruening added a commit that referenced this issue Jun 22, 2021
Adds the instruction count to the buffer header for drcachesim
filtered traces, instead of just at thread exit, to aid in correlating
filtered and unfiltered traces.

Updates the documentation and the trace_invariants test.

Augments the view tool to print out these markers.
Further verified manually on a larger trace with the view tool:

    <marker: filetype 65>
    <marker: cache line size 64>
    <marker: timestamp 13268850698443940>
    <marker: tid 2539348 on core 1>
    <marker: instruction count 33326>
      0x00007f6f64314090  48 89 e7             mov    %rsp, %rdi
      0x00007f6f6432dce0  b8 0c 00 00 00       mov    $0x0000000c, %eax
    <marker: timestamp 13268850698727951>
    <marker: tid 2539348 on core 1>
    <marker: instruction count 40130>
      0x00007f6f6432c906  c7 44 24 3c 00 00 00 movl   $0x00000000, 0x3c(%rsp)
    <...>
      0x00007f6f63cc3670  4c 8b 05 f9 27 0f 00 mov    <rel> 0x00007f6f63db5e70, %r8
    <marker: timestamp 13268850699555438>
    <marker: tid 2539348 on core 1>
    <marker: instruction count 126115>

Fixes #4857
derekbruening added a commit that referenced this issue Jun 22, 2021
Adds the instruction count to the buffer header for drcachesim
filtered traces, instead of just at thread exit, to aid in correlating
filtered and unfiltered traces.

Updates the documentation and the trace_invariants test.

Augments the view tool to print out these markers.
Further verified manually on a larger trace with the view tool:

    <marker: filetype 65>
    <marker: cache line size 64>
    <marker: timestamp 13268850698443940>
    <marker: tid 2539348 on core 1>
    <marker: instruction count 33326>
      0x00007f6f64314090  48 89 e7             mov    %rsp, %rdi
      0x00007f6f6432dce0  b8 0c 00 00 00       mov    $0x0000000c, %eax
    <marker: timestamp 13268850698727951>
    <marker: tid 2539348 on core 1>
    <marker: instruction count 40130>
      0x00007f6f6432c906  c7 44 24 3c 00 00 00 movl   $0x00000000, 0x3c(%rsp)
    <...>
      0x00007f6f63cc3670  4c 8b 05 f9 27 0f 00 mov    <rel> 0x00007f6f63db5e70, %r8
    <marker: timestamp 13268850699555438>
    <marker: tid 2539348 on core 1>
    <marker: instruction count 126115>

Fixes #4857
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment