Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drcachesim: add offline trace file support #1729

Closed
derekbruening opened this issue Jul 8, 2015 · 7 comments
Closed

drcachesim: add offline trace file support #1729

derekbruening opened this issue Jul 8, 2015 · 7 comments

Comments

@derekbruening
Copy link
Contributor

Split from #1703

The drcachesim design isolates the source of memory references, allowing us to add an offline trace file feature to complement the live, online default operation. This issue covers adding both tracer output to a file and simulator input from a file.

Once we have file support, we can also add more reliable tests of the simulator that will not vary based on the compiler or machine used.

@derekbruening
Copy link
Contributor Author

Initial support was added without yet changing the tracing format:

d4b3e61 i#1729 offline traces: generate unoptimized traces
2eb7494 i#1729 offline traces: refactor trace reader to add file support
357da25 i#1729 offline traces: refactor drcachesim to simulate from a trace file
a71bf92 i#1729 offline traces: add a generate-and-read multi-step test

Unfortunately packing the trace_entry_t struct broke the AArch64 instrumentation.

@derekbruening
Copy link
Contributor Author

Further work:

d868d95 i#1729 offline traces: add static bursty trace support
5b0281b i#1729 offline traces: refactor tracer for multiple trace formats
3e1090e i#1729 offline traces: create module tracking library
97abbd4 i#1729 offline traces: add drx_open_unique_appid_dir()
8c0ebec i#1729 offline traces: use a new subdir to group offline files
4a2e614 i#1729 offline traces: fix clang warnings

@derekbruening
Copy link
Contributor Author

derekbruening commented Sep 29, 2016

The optimized offline trace file stores just a module + offset for instruction fetch entries.
To disassemble each mod+offs, we need to load the target module's segments.

winsyms uses imagehlp, but on Linux there's no equivalent: dlopen and dlmopen will execute some code so we can't easily use them.

Choices:

  1. Add dr_load_library_but_do_not_execute() (basically
    elf_loader_map_phdrs(), so no constructors called or TLS set up or
    imports processed) to regular DR API and then use it from standalone.
    This is the easiest to implement but not sure it belongs in the client
    API. I guess you could imagine a tool wanting to examine some lib
    that's not currently loaded.

  2. Split the various module_elf routines into their own static lib that the
    core, frontendlib, and now postproc can use;
    Xref create syscall and module libc-independent utils lib for proper NOT_CORE_PROPER sharing, including with tests #1409.

  3. Write some code from scratch to compute offs within file that
    corresponds to load offs, and then mmap the file. Could add this to
    drfrontendlib? Adding "backend" == postprocess to it?

    Possible advantage: could mmap just .text (need more header parsing
    there), or seek in file and avoid mmap, if binary is enormous?
    Xref add library section iterator to API? #270: add library section iterator to API

  4. drsyms also has some ELF code: it takes in offs from base for its
    API, has code to compute base, and has ELF section iteration code.
    Would it make sense to add "disassemble from base+offs" to drsyms?

If postproc were already going to use drsyms then this "D) add disasm
base+offs" seems best -- but it's not clear that's the case.

@derekbruening
Copy link
Contributor Author

derekbruening commented Oct 20, 2016

Further work:

  • 18f0be5 i#1729 offline traces: implement initial optimized tracing
  • 3fc22d4 i#1729 offline traces: add module information
  • 36df866 i#1143: add support for %[] to dr_sscanf
  • 8b573d4 i#1729 offline traces: add dr_map_executable_file()
  • c2463c9 i#1729 offline traces: add additional info to drmodtrack module file
  • 7ca9291 i#1729 offline traces: add start+count to offline ifetch information
  • 906cb56 i#1729 offline traces: prep for raw2trace by sharing code
  • a3675b2 i#1729 offline traces: add converter from raw offline to full trace
  • 3a688f1 i#1729 offline traces: embed raw2trace into drcachesim
  • 8839eb5 Support "drrun -t" with native tool offline actions
  • d73bef5 i#2011: eliminate incorrect ifetches from repstr expansion
  • 0934f54 Include extension paths in module info
  • 30a3847 i#1729 offline traces: fix clang warnings
  • ee525ca i#1729 offline traces: add diagnostic for missing entries
  • 53468ef i#1729 offline traces: fix use-after-free in drmodtrack
  • 5380417 Add dr_get_microseconds()
  • 0d332a9 i#1729 offline traces: add option to cap raw trace file size
  • 354960a i#1729 offline traces: add timestamps and thread interleaving
  • aec18b9 i#1729 offline traces: relax bursty thread template

@derekbruening
Copy link
Contributor Author

I forgot to put these perf numbers here from 2016-09-26:

# cd /extsw/spec2006/bzip2-test
# /usr/bin/time /work/dr/git/build_x64_rel/bin64/drrun -t drcachesim -offline -- ./bzip2_base.gcc-64bit dryer.jpg 2

native:
1.95user 0.00system 0:01.96elapsed 100%CPU (0avgtext+0avgdata 17176maxresident)k

No disk writes at all, online traces (so not all that optimized as cache simulator is bottleneck):
30.23user 0.01system 0:30.49elapsed 99%CPU (0avgtext+0avgdata 24300maxresident)k
That's 15.6x.

Offline trace format, optimized to have just a single 8-byte store:
So in summary, w/ no disk writes, for just data we have 10.5x:
20.51user 0.01system 0:20.52elapsed 100%CPU (0avgtext+0avgdata 24448maxresident)k

Adding one PC entry per bb is 13.7x:
26.77user 0.01system 0:26.79elapsed 100%CPU (0avgtext+0avgdata 24460maxresident)k

@derekbruening
Copy link
Contributor Author

And with disk:

With disk writes, we're talking about a >64GB file:
drcachesim num refs seen: 8,981,405,191
HDD is 169x and makes a 67GB file
# /usr/bin/time /work/dr/git/build_x64_rel/bin64/drrun -t drcachesim -offline -outdir /vms -- ./bzip2_base.gcc-64bit dryer.jpg 2
33.38user 24.44system 5:32.33elapsed 17%CPU (0avgtext+0avgdata 24252maxresident)k

Removing PC entries results in a 45GB file and 104x slowdown:
26.01user 15.13system 3:24.29elapsed 20%CPU (0avgtext+0avgdata 24356maxresident)k

SSD:
# /usr/bin/time ./bzip2_base.gcc-64bit dryer.jpg 2
2.89user 0.00system 0:02.90elapsed 99%CPU (0avgtext+0avgdata 17960maxresident)k

67x:
# /usr/bin/time ~/dr/git/build_x64_rel/bin64/drrun -t drcachesim -offline -- ./bzip2_base.gcc-64bit dryer.jpg 2
43.51user 35.81system 3:13.93elapsed 40%CPU (0avgtext+0avgdata 24696maxresident)k

Removed PC entries is 42x:
33.08user 24.05system 2:01.95elapsed 46%CPU (0avgtext+0avgdata 24888maxresident)k

derekbruening added a commit that referenced this issue Jun 1, 2017
Adds missing support for an ARM instruction cache flush in offline trace
postprocessing (#1729).
derekbruening added a commit that referenced this issue Jun 1, 2017
Adds missing support for an ARM instruction cache flush in offline trace
postprocessing (#1729).
derekbruening added a commit that referenced this issue Jun 1, 2017
Fixes two issues with offline traces (#1729): first, changes the reader to
not assume a separate tid header right after the initial tid,pid header, as
offline traces (as opposed to online) do not contain it.  Second, changes
raw2trace to not emit a superfluous tid entry of 0 just prior to the real
initial tid entry for each thread.

Adds asserts to catch related errors in the future in the test suite.
derekbruening added a commit that referenced this issue Jun 2, 2017
Fixes two issues with offline traces (#1729): first, changes the reader to
not assume a separate tid header right after the initial tid,pid header, as
offline traces (as opposed to online) do not contain it.  Second, changes
raw2trace to not emit a superfluous tid entry of 0 just prior to the real
initial tid entry for each thread.

Adds asserts to catch related errors in the future in the test suite.
The extra checks failed on the multiproc test so we fix that here for online
by adding a fork event handler that sends a new header to the simulator.
We partially solve #2384 by creating a new offline dir and files but more
work is needed to merge the final traces in the added test here: that's
left for #2384.
@derekbruening
Copy link
Contributor Author

This was considered complete a while back: closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant