Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for -use_physical with drcachesim -offline #4014

Open
derekbruening opened this issue Jan 13, 2020 · 1 comment
Open

Add support for -use_physical with drcachesim -offline #4014

derekbruening opened this issue Jan 13, 2020 · 1 comment

Comments

@derekbruening
Copy link
Contributor

-use_physical is only supported for online drcachesim today, but that is not made clear in the docs nor in actual usage. The tool lets you run -offline -use_physical and ends up with a post-processed trace with physical data addresses but virtual PC fetches, due to an accident. instr_offline_t::get_entry_addr blindly treats the entry as a data ref, so it tries to
translate bogus addresses composed of module id + offset encodings.
And indeed we get warnings at -verbose 1+:

virtual2physical translation failure for <10,  0, 0x0004000c00000f30>
virtual 140727803080432 => physical 3824
virtual2physical translation failure for <10,  0, 0x002c000c00001d30>
virtual2physical translation failure for <10,  0, 0x0016000c00001d85>
virtual2physical translation failure for <10,  0, 0x0004000c00001de3>
virtual2physical translation failure for <10,  0, 0x000a000c00001dd3>
virtual 140346642070176 => physical 2720
virtual 140346642067080 => physical 3720
virtual2physical translation failure for <10,  0, 0x0004000c00001de3>
...

Those are the instruction entries:

$ od -t x8 -A x `ls -1td drmemtrace.threadsig.*.dir | head -1`/raw/*.raw | head
000000 c000000000000003 4000000000030385
000010 6000000000030385 802efaa01299a30c
000020 c203000000000003 2004000c00000f30
000030 0000000000000ef0 202c000c00001d30
000040 0000000000000ee8 0000000000000ee0
000050 0000000000000ed8 0000000000000ed0
000060 0000000000000ec8 0000000000000ec0
000070 2016000c00001d85 2004000c00001de3
000080 200a000c00001dd3 0000000000000aa0
000090 0000000000000e88 2004000c00001de3

So we have two action items. First, we should have the front-end refuse to combine -use_physical and -offline for now, and update the docs to reflect this.

Next, we need to decide whether to try to support this combination. Is it possible that future Linux distros will all shut down access to pagemap? Although for research purposes running as sudo or something may still be feasible.

How would we support this? We'd either have to store extra info for each data ref plus
an entry for each instr like for DGC (#2062), or have a custom solution just for offline
where each block PC has 2 entries: one virtual and one physical. The former will not work
well w/ tools that want operands (like opcode_mix or micro-arch simulators): but that would be the same for DGC, so we might need an option to store the full instr bytes.

Another limitation today (which perhaps should have its own issue) with -use_physical relates to static linking which we often use for offline tracing:

        /* Unfortunately the use of std::unordered_map in physaddr_t calls malloc
         * and thus we cannot support it for static linking, so we override the
         * DR_DISALLOW_UNSAFE_STATIC declaration.
         */
        dr_allow_unsafe_static_behavior();
#ifdef DRMEMTRACE_STATIC
        NOTIFY(0, "-use_physical is unsafe with statically linked clients\n");
#endif

Xref #2912 but that issue has many confusing entries so it seemed better to start clean here.

derekbruening added a commit that referenced this issue Jan 13, 2020
In drcachesim, the combination of -offline and -use_physical is not
supported at this time.  We make that clear in the option docs and
with an up-front exit when the two are requested at once.

Issue: #4014
derekbruening added a commit that referenced this issue Jan 13, 2020
In drcachesim, the combination of -offline and -use_physical is not
supported at this time.  We make that clear in the option docs and
with an up-front exit when the two are requested at once.

Issue: #4014
derekbruening added a commit that referenced this issue May 27, 2022
Switches from std::unordered_map in physaddr_t to a drcontainers
hashtable to avoid malloc and make things safe for statically-linked
drmemtrace.

Similarly, switches from std::ostringstream to dr_snprintf in
physaddr_t::init() to avoid malloc.

Tested on a multi-threaded app which hits the post-init malloc warning
without both fixes (test will be added in a forthcoming PR: it cannot
be added now as physaddr_t is not thread-safe yet).

Issue: #4014
derekbruening added a commit that referenced this issue May 27, 2022
Switches from std::unordered_map in physaddr_t to a drcontainers
hashtable to avoid malloc and make things safe for statically-linked
drmemtrace.

Similarly, switches from std::ostringstream to dr_snprintf in
physaddr_t::init() to avoid malloc.

Tested on a multi-threaded app which hits the post-init malloc warning
without both fixes (test will be added in a forthcoming PR: it cannot
be added now as physaddr_t is not thread-safe yet).

Issue: #4014
derekbruening added a commit that referenced this issue May 27, 2022
The physaddr_t class is not thread-safe and was previously used racily
in the drmemtrace code.  We fix that by creating a separate instance
per thread.  A test with multiple threads is added.

Issue: #4014
@derekbruening
Copy link
Contributor Author

Previously, physaddr_t was just used racily.
I made it per-thread all at once.
This does result in a per-thread file descriptor being opened, which
may not scale well: it will exhaust DR's private file-descriptor space
and could possibly hit rlimits. Improving scaling is a still-open action item.

derekbruening added a commit that referenced this issue May 27, 2022
The physaddr_t class is not thread-safe and was previously used racily
in the drmemtrace code.  We fix that by creating a separate instance
per thread.  A test with multiple threads is added.

This does result in a per-thread file descriptor being opened, which
may not scale well: it will exhaust DR's private file-descriptor space
and could possibly hit rlimits.  Improving scaling is left as future work.

Issue: #4014
derekbruening added a commit that referenced this issue May 27, 2022
Adds a new reverse-lookup routine drmodtrack_lookup_pc_from_index()
which is needed to implement physical address support for offline
dr$sim traces.

Adds a simple test.

Issue: #4014
derekbruening added a commit that referenced this issue May 31, 2022
Adds a new reverse-lookup routine drmodtrack_lookup_pc_from_index()
which is needed to implement physical address support for offline
dr$sim traces.

Adds a simple test.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 2, 2022
Adds support for different page sizes from 4K for drmemtrace's
-use_physical.

Adds hugepage support, which does not require anything special: just
updated the comment.

Improves the error reporting for physical translation.

Tested on an AArch64 machine with 64K pages.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 3, 2022
Adds support for different page sizes from 4K for drmemtrace's
-use_physical.

Adds hugepage support, which does not require anything special: just
updated the comment.

Improves the error reporting for physical translation.

Tested on an AArch64 machine with 64K pages.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 3, 2022
Adds support for physical addresses to offline dr$sim traces.  To
support simulators wanting both virtual and physical addresses, and to
simplify post-processing where the virtual PC values are needed, the
regular trace entries remain all virtual.  A new marker type
TRACE_MARKER_TYPE_PHYSICAL_ADDRESS listing the corresponding physical
address is added.  The mappings are assumed to not change, allowing
just one marker for each newly-observed page.  This is done
per-thread.

An explicit TRACE_MARKER_TYPE_PHYSICAL_ADDRESS_NOT_AVAILABLE marker is
inserted on failure to translate, to prevent analyzers from having to
infer this due to the lack of the already-sparse markers.

Separately emitted pairs of virtual and physical address markers were
considered, with raw2trace inserting the physical at the right place,
but that presents complexities with buffer handoff and with the first
buffer.  Instead, the physical are inserted via memmove directly into
the buffer.  This does not seem to be a performance concern: the
translation lookup is the bottleneck.

Adds support for the new markers to the view tool.

Adds a Linux x86_64 test that runs a tiny asm app and ensures a
physical address marker is inserted.  The test needs to run as sudo,
along with its pre- and post- commands.  Currently it is enabled
everywhere, so a user running interactive tests will have it pause
while it waits for input.  This might cause issues with manually
running the test suite.

A number of items remain for further work:
+ Performance is poor: the hashtable and caching need improvement.
+ There is a hardcoded limit on how many markers can be added
  per buffer.  Once this is exceeded, further markers are dropped.
  We should split the buffer to handle this.
+ We may want to add a mode that checks for mapping changes.
+ Missing privileges results in every physical address being 0 instead
  of showing the failure.  We need to check the capabilities to distinguish.
+ Better testing that we're actually getting physical addresses for online
  tests.
+ Better offline testing with larger apps.
+ Basic blocks that cross a page have only the first one translated.
+ A file descriptor per thread is used, which will not scale well with
  DR's descriptor protection and might hit rlimits.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 7, 2022
For virtual-to-physical translation, we cannot use a 3rd-party library
such as STL due to static linking constraints.  Yet the drcontainers
hashtable performs poorly; we need an open-address hashtable.  Since
DR has one we export it here in a new interface.

Adds a simple test and documentation.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 7, 2022
Adds support for physical addresses to offline dr$sim traces.  To
support simulators wanting both virtual and physical addresses, and to
simplify post-processing where the virtual PC values are needed, the
regular trace entries remain all virtual.  A new marker type
TRACE_MARKER_TYPE_PHYSICAL_ADDRESS listing the corresponding physical
address is added.  The mappings are assumed to not change, allowing
just one marker for each newly-observed page.  This is done
per-thread.

An explicit TRACE_MARKER_TYPE_PHYSICAL_ADDRESS_NOT_AVAILABLE marker is
inserted on failure to translate, to prevent analyzers from having to
infer this due to the lack of the already-sparse markers.

Separately emitted pairs of virtual and physical address markers were
considered, with raw2trace inserting the physical at the right place,
but that presents complexities with buffer handoff and with the first
buffer.  Instead, the physical are inserted via memmove directly into
the buffer.  This does not seem to be a performance concern: the
translation lookup is the bottleneck.  Since the memmoves occur only on the
first instance of each page, they are much rarer than all the virtual-to-physical
translations.

Adds support for the new markers to the view tool.

Adds a Linux x86_64 test that runs a tiny asm app and ensures a
physical address marker is inserted.  The test needs to run as sudo,
along with its pre- and post- commands.  To avoid a confusing blocking
password query in local runs, a new set of tests controlled by a new
CMake option RUN_SUDO_TESTS is added.  It is set only for automated_ci,
where we assume a passwordless sudo.

A number of items remain for further work:
+ Performance is poor: the hashtable and caching need improvement.
+ There is a hardcoded limit on how many markers can be added
  per buffer.  Once this is exceeded, further markers are dropped.
  We should split the buffer to handle this.
+ We may want to add a mode that checks for mapping changes.
+ Missing privileges results in every physical address being 0 instead
  of showing the failure.  We need to check the capabilities to distinguish.
+ Better testing that we're actually getting physical addresses for online
  tests.
+ Better offline testing with larger apps.
+ Basic blocks that cross a page have only the first one translated.
+ A file descriptor per thread is used, which will not scale well with
  DR's descriptor protection and might hit rlimits.
+ Online traces still replace all virtual addresses with physical.
  We should break compatibility and transition them to use these markers,
  with dr$sim computing the physical addresses from the markers.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 7, 2022
For virtual-to-physical translation, we cannot use a 3rd-party library
such as STL due to static linking constraints.  Yet the drcontainers
hashtable performs poorly; we need an open-address hashtable.  Since
DR has one we export it here in a new interface.

Adds a simple test and documentation.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 8, 2022
Switches from the drcontainers hashtable to the new open-address
hashtable provided by DR.  This is 2x to 3x faster due to the reduced
dereferences from the inlined data.

Increases the last-value cache from one entry to an array of 8
entries.  This was found to provide improved performance on small
benchmarks.

Measured on bzip2 local and SPEC2006 runs as they are short enough to
allow interactive experimentation.  -use_physical still incurs a ~2.5x
slowdown, but it was 9x before these changes.  The bottleneck is no
longer the hashtable but is now spread across all the address
iteration and querying code.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 8, 2022
Switches from the drcontainers hashtable to the new open-address
hashtable provided by DR.  This is 2x to 3x faster due to the reduced
dereferences from the inlined data.

Increases the last-value cache from one entry to an array of 8
entries.  This was found to provide improved performance on small
benchmarks.

Measured on bzip2 local and SPEC2006 runs as they are short enough to
allow interactive experimentation.  -use_physical still incurs a ~2.5x
slowdown, but it was 9x before these changes.  The bottleneck is no
longer the hashtable but is now spread across all the address
iteration and querying code.

Issue: #4014
dolanzhao pushed a commit that referenced this issue Jun 8, 2022
Adds support for physical addresses to offline dr$sim traces.  To
support simulators wanting both virtual and physical addresses, and to
simplify post-processing where the virtual PC values are needed, the
regular trace entries remain all virtual.  A new marker type
TRACE_MARKER_TYPE_PHYSICAL_ADDRESS listing the corresponding physical
address is added.  The mappings are assumed to not change, allowing
just one marker for each newly-observed page.  This is done
per-thread.

An explicit TRACE_MARKER_TYPE_PHYSICAL_ADDRESS_NOT_AVAILABLE marker is
inserted on failure to translate, to prevent analyzers from having to
infer this due to the lack of the already-sparse markers.

Separately emitted pairs of virtual and physical address markers were
considered, with raw2trace inserting the physical at the right place,
but that presents complexities with buffer handoff and with the first
buffer.  Instead, the physical are inserted via memmove directly into
the buffer.  This does not seem to be a performance concern: the
translation lookup is the bottleneck.  Since the memmoves occur only on the
first instance of each page, they are much rarer than all the virtual-to-physical
translations.

Adds support for the new markers to the view tool.

Adds a Linux x86_64 test that runs a tiny asm app and ensures a
physical address marker is inserted.  The test needs to run as sudo,
along with its pre- and post- commands.  To avoid a confusing blocking
password query in local runs, a new set of tests controlled by a new
CMake option RUN_SUDO_TESTS is added.  It is set only for automated_ci,
where we assume a passwordless sudo.

A number of items remain for further work:
+ Performance is poor: the hashtable and caching need improvement.
+ There is a hardcoded limit on how many markers can be added
  per buffer.  Once this is exceeded, further markers are dropped.
  We should split the buffer to handle this.
+ We may want to add a mode that checks for mapping changes.
+ Missing privileges results in every physical address being 0 instead
  of showing the failure.  We need to check the capabilities to distinguish.
+ Better testing that we're actually getting physical addresses for online
  tests.
+ Better offline testing with larger apps.
+ Basic blocks that cross a page have only the first one translated.
+ A file descriptor per thread is used, which will not scale well with
  DR's descriptor protection and might hit rlimits.
+ Online traces still replace all virtual addresses with physical.
  We should break compatibility and transition them to use these markers,
  with dr$sim computing the physical addresses from the markers.

Issue: #4014
dolanzhao pushed a commit that referenced this issue Jun 8, 2022
For virtual-to-physical translation, we cannot use a 3rd-party library
such as STL due to static linking constraints.  Yet the drcontainers
hashtable performs poorly; we need an open-address hashtable.  Since
DR has one we export it here in a new interface.

Adds a simple test and documentation.

Issue: #4014
dolanzhao pushed a commit that referenced this issue Jun 8, 2022
Switches from the drcontainers hashtable to the new open-address
hashtable provided by DR.  This is 2x to 3x faster due to the reduced
dereferences from the inlined data.

Increases the last-value cache from one entry to an array of 8
entries.  This was found to provide improved performance on small
benchmarks.

Measured on bzip2 local and SPEC2006 runs as they are short enough to
allow interactive experimentation.  -use_physical still incurs a ~2.5x
slowdown, but it was 9x before these changes.  The bottleneck is no
longer the hashtable but is now spread across all the address
iteration and querying code.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 14, 2022
Solves issues with requiring precise placement of physical markers
prior to their corresponding regular virtual address entry (ifetch
entries being inserted between the marker and memref by raw2trace;
delayed branches needing to carry their marker; page-spanning extra
markers needing to land in the right place) by switching to a scheme
where markers come in pairs with a physical and a virtual.  Such a
pair has flexibility on where it can appear in the trace so long as it
is prior to the regular entry with that virtual address.  This
eliminates all work and complexity for raw2trace to move the markers
around.

Solves issues with capacity where the tracer runs out of pre-allocated
space inside the main buffer for marker-heavy output by switching to
use a separately-allocated-and-emitted "v2p" buffer for the
physical,virtual pairs.  This is made possible by the flexible
location.  Two complexities of this scheme are solved: the buffer is
page-aligned and heap-allocated for buffer handoff; the initial thread
header is added to the v2p buffer and skipped in the regular buffer to
avoid the pairs appearing before the top-level metadata.

Refactors header insertion and buffer output into separate functions
out of the large memtrace() function.

Multi-output was tested on SPEC2K6 gzip which hits the 1-page limit 4
times during its run with a scaled-down test input set workload.

The view tool, invariant checker, release notes, and regression test
are updated.  The invariant checker now also checks that the physical
and virtual pair member share their bottom bits.  are identical.

Includes a significant fix for physical offline tracing:
Elision and address displacement optimizations are disabled for
physical addresses as the final address is required at tracing time.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 15, 2022
To use the new physical address markers in a trace analysis tool, the
page size must be known.  Here we add a new marker to the thread
headers holding the page size.  Sanity tests are added.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 15, 2022
Solves issues with requiring precise placement of physical markers
prior to their corresponding regular virtual address entry (ifetch
entries being inserted between the marker and memref by raw2trace;
delayed branches needing to carry their marker; page-spanning extra
markers needing to land in the right place) by switching to a scheme
where markers come in pairs with a physical and a virtual.  Such a
pair has flexibility on where it can appear in the trace so long as it
is prior to the regular entry with that virtual address.  This
eliminates all work and complexity for raw2trace to move the markers
around.

Solves issues with capacity where the tracer runs out of pre-allocated
space inside the main buffer for marker-heavy output by switching to
use a separately-allocated-and-emitted "v2p" buffer for the
physical,virtual pairs.  This is made possible by the flexible
location.  Two complexities of this scheme are solved: the buffer is
page-aligned and heap-allocated for buffer handoff; the initial thread
header is added to the v2p buffer and skipped in the regular buffer to
avoid the pairs appearing before the top-level metadata.

Refactors header insertion and buffer output into separate functions
out of the large memtrace() function.

Multi-output was tested on SPEC2K6 gzip which hits the 1-page limit 4
times during its run with a scaled-down test input set workload.

The view tool, invariant checker, release notes, and regression test
are updated.  The invariant checker now also checks that the physical
and virtual pair member share their bottom bits.  are identical.

Includes a significant fix for physical offline tracing:
Elision and address displacement optimizations are disabled for
physical addresses as the final address is required at tracing time.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 15, 2022
Since the kernel lets an unprivileged user read the pagemap file but
supplies 0 values for the physical pages (and 0 is a possible
legitimate value), we add an explicit check for privileges and fail up
front if physical addresses are requested but not available.  This is
a change in behavior where before execution would continue with a
warning.

Adds a test of missing privileges.

Fixes a sentinel issue where a 0 physical page was stored as 1 in the
table: but 1 is a possible valid number as well, so -1 is used
instead.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 15, 2022
Solves issues with requiring precise placement of physical markers
prior to their corresponding regular virtual address entry (ifetch
entries being inserted between the marker and memref by raw2trace;
delayed branches needing to carry their marker; page-spanning extra
markers needing to land in the right place) by switching to a scheme
where markers come in pairs with a physical and a virtual.  Such a
pair has flexibility on where it can appear in the trace so long as it
is prior to the regular entry with that virtual address.  This
eliminates all work and complexity for raw2trace to move the markers
around.

Solves issues with capacity where the tracer runs out of pre-allocated
space inside the main buffer for marker-heavy output by switching to
use a separately-allocated-and-emitted "v2p" buffer for the
physical,virtual pairs.  This is made possible by the flexible
location.  Two complexities of this scheme are solved: the buffer is
page-aligned and heap-allocated for buffer handoff; the initial thread
header is added to the v2p buffer and skipped in the regular buffer to
avoid the pairs appearing before the top-level metadata.

Refactors header insertion and buffer output into separate functions
out of the large memtrace() function.

Multi-output was tested on SPEC2K6 gzip which hits the 1-page limit 4
times during its run with a scaled-down test input set workload.

The view tool, invariant checker, release notes, and regression test
are updated.  The invariant checker now also checks that the physical
and virtual pair member share their bottom bits.  are identical.

Includes a significant fix for physical offline tracing:
Elision and address displacement optimizations are disabled for
physical addresses as the final address is required at tracing time.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 15, 2022
To use the new physical address markers in a trace analysis tool, the
page size must be known.  Here we add a new marker to the thread
headers holding the page size.  Sanity tests are added.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 16, 2022
Since the kernel lets an unprivileged user read the pagemap file but
supplies 0 values for the physical pages (and 0 is a possible
legitimate value), we add an explicit check for privileges and fail up
front if physical addresses are requested but not available.  This is
a change in behavior where before execution would continue with a
warning.

Adds a test of missing privileges.

Adds sudo to the online drcachesim -use_physical tests.
Only enables RUN_SUDO_TESTS if CI_TRIGGER is set, since
Jenkins does not have password-less sudo.

Fixes a sentinel issue where a 0 physical page was stored as 1 in the
table: but 1 is a possible valid number as well, so -1 is used
instead.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 17, 2022
Adds handling of page-spanning instruction fetch and data accesses.
For offline traces these are heuristics as we do not know the precise
sizes at tracing time.  Since an extra physical page value is not
harmful, the heuristics emit a second page if there is any decent
change that it is reached.

Adds a data page span test by switching the drcacheoff.phys test to
use allasm_repstr and adding such an access there.  An instruction
cross is more difficult to synthesize.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 18, 2022
Adds handling of page-spanning instruction fetch and data accesses.
For offline traces these are heuristics as we do not know the precise
sizes at tracing time.  Since an extra physical page value is not
harmful, the heuristics emit a second page if there is any decent
chance that it is reached.

Adds a data page span test by switching the drcacheoff.phys test to
use allasm_repstr and adding such an access there.  An instruction
cross is more difficult to synthesize.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 21, 2022
Fixes two bugs causing crashes with -use_physical:

+ Records the drcontext used for hashtable creation to ensure the same
  one is used at destruction as a different thread can call the exit
  event.

+ Orders the drmemtrace thread exit event before the drmodtrack one to
  ensure drmodtrack access during the final thread buffer output is safe.

These fixes were manually tested on large multi-threaded applications
where these two crashes showed up before and disappear with the fixes.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 22, 2022
Fixes two bugs causing crashes with -use_physical:

+ Records the drcontext used for hashtable creation to ensure the same
  one is used at destruction as a different thread can call the exit
  event.

+ Orders the drmemtrace thread exit event before the drmodtrack one to
  ensure drmodtrack access during the final thread buffer output is safe.

These fixes were manually tested on large multi-threaded applications
where these two crashes showed up before and disappear with the fixes.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 24, 2022
The view tool was not printing the virtual address for
physical-to-virtual translations, nor the address for software
prefetches, which made it hard to diagnose physical address failures
in general and in particular when the culprit virtual address was from
a prefetch as it did not appear in the text output.  We add that here.

Since some prefetches have long names, we abbreviate them, but still
need to widen the name field to align them all.

While at it, we add the function marker values to the view tool as
well.

Example output:
       10: T462774 ifetch        7 byte(s) @ 0x00007f6cae908a54 non-branch
       11: T462774 read          8 byte(s) @ 0x00007f6cae90ca20 by PC 0x00007f6cae908a54
       12: T462774 ifetch        4 byte(s) @ 0x00007f6cae908a5c non-branch
       13: T462774 pref-w-L3-NT  8 byte(s) @ 0x00007f6c389xd238 by PC 0x00007f6cae908a5c
      ...
   441522: T48816 <marker: function #9>
   441523: T48816 <marker: function return address 0xaaaad5de4944>
   441524: T48816 <marker: function argument 0x15187bd4ae1d>
   441526: T48816 <marker: function argument 0x1>
      ...
   441546: T48816 <marker: function #9>
   441547: T48816 <marker: function return value 0x15187bd4ae1d>

Issue: #4014
derekbruening added a commit that referenced this issue Jun 24, 2022
Adds counts of physical,virtual marker pairs and of unavailable
physical address markers to the basic_counts tool.  Updates various
test outputs.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 24, 2022
Adds counts of physical,virtual marker pairs and of unavailable
physical address markers to the basic_counts tool.  Updates various
test outputs.

Issue: #4014
derekbruening added a commit that referenced this issue Jun 28, 2022
…5544)

The view tool was not printing the virtual address for
virtual-to-physical translations, nor the address for software
prefetches, which made it hard to diagnose physical address failures
in general and in particular when the culprit virtual address was from
a prefetch as it did not appear in the text output.  We add that here.

Since some prefetches have long names, we abbreviate them, but still
need to widen the name field to align them all.

While at it, we add the function marker values to the view tool as
well.

Example output:
       10: T462774 ifetch        7 byte(s) @ 0x00007f6cae908a54 non-branch
       11: T462774 read          8 byte(s) @ 0x00007f6cae90ca20 by PC 0x00007f6cae908a54
       12: T462774 ifetch        4 byte(s) @ 0x00007f6cae908a5c non-branch
       13: T462774 pref-w-L3-NT  8 byte(s) @ 0x00007f6c389xd238 by PC 0x00007f6cae908a5c
      ...
   441522: T48816 <marker: function #9>
   441523: T48816 <marker: function return address 0xaaaad5de4944>
   441524: T48816 <marker: function argument 0x15187bd4ae1d>
   441526: T48816 <marker: function argument 0x1>
      ...
   441546: T48816 <marker: function #9>
   441547: T48816 <marker: function return value 0x15187bd4ae1d>

Issue: #4014
derekbruening added a commit that referenced this issue Aug 3, 2022
When -use_physical is set, the cache and TLB simulators read the new
virtual-to-physical translation markers and use them to simulate
physical addresses.

--------------------------------------------------
Tested: Ran manually and looked at logs.
Open to suggestions for how to automate testing.

$ rm -rf drmemtrace.sim*.dir; ninja && sudo bin64/drrun -stderr_mask 0 -t drcachesim -use_physical -offline -- suite/tests/bin/simple_app && sudo chown -R $USER drmemtrace.sim*.dir && bin64/drrun -t drcachesim -indir drmemtrace.sim*.dir -use_physical -verbose 3 > OUT 2>&1
$ less OUT
translating virtual 0x7fed14de0050 to 0xf52ace050
::3036256.3036256::  @0xf52ace050 instr x3
translating virtual 0x7fed14de0053 to 0xf52ace053
::3036256.3036256::  @0xf52ace053 instr x5
translating virtual 0x7ffca0af9068 to 0xb3f975068
translating virtual 0x7fed14de0053 to 0xf52ace053
::3036256.3036256::  @0xf52ace053 write 0xb3f975068 x8

$ bin64/drrun -t drcachesim -indir drmemtrace.sim*.dir -use_physical -simulator_type TLB -verbose 3 > OUT 2>&1
$ les OUT
translating virtual 0x7f6f263a3050 to 0xf52ace050
::3080711.3080711::  @0x7070615f656c706d instr 0xf52ace050 x3
translating virtual 0x7f6f263a3053 to 0xf52ace053
::3080711.3080711::  @0x7070615f656c706d direct_call 0xf52ace053 x5
translating virtual 0x7ffdeaab3798 to 0xed4c52798
translating virtual 0x7f6f263a3053 to 0xf52ace053
--------------------------------------------------

Issue: #4014
derekbruening added a commit that referenced this issue Aug 4, 2022
When -use_physical is set, the cache and TLB simulators read the new
virtual-to-physical translation markers and use them to simulate
physical addresses.

Changes drcachesim online mode to leave addresses virtual and insert markers
instead, just like offline.  Adds a compatibility change note and updates the docs.

Includes a fix for -cpu_scheduling where the cached last thread was not
reset on a cpu change with no thread change in between.

--------------------------------------------------
Tested: Ran manually and looked at logs.
Open to suggestions for how to automate testing.

$ rm -rf drmemtrace.sim*.dir; ninja && sudo bin64/drrun -stderr_mask 0 -t drcachesim -use_physical -offline -- suite/tests/bin/simple_app && sudo chown -R $USER drmemtrace.sim*.dir && bin64/drrun -t drcachesim -indir drmemtrace.sim*.dir -use_physical -verbose 3 > OUT 2>&1
$ less OUT
translating virtual 0x7fed14de0050 to 0xf52ace050
::3036256.3036256::  @0xf52ace050 instr x3
translating virtual 0x7fed14de0053 to 0xf52ace053
::3036256.3036256::  @0xf52ace053 instr x5
translating virtual 0x7ffca0af9068 to 0xb3f975068
translating virtual 0x7fed14de0053 to 0xf52ace053
::3036256.3036256::  @0xf52ace053 write 0xb3f975068 x8

$ bin64/drrun -t drcachesim -indir drmemtrace.sim*.dir -use_physical -simulator_type TLB -verbose 3 > OUT 2>&1
$ les OUT
translating virtual 0x7f6f263a3050 to 0xf52ace050
::3080711.3080711::  @0x7070615f656c706d instr 0xf52ace050 x3
translating virtual 0x7f6f263a3053 to 0xf52ace053
::3080711.3080711::  @0x7070615f656c706d direct_call 0xf52ace053 x5
translating virtual 0x7ffdeaab3798 to 0xed4c52798
translating virtual 0x7f6f263a3053 to 0xf52ace053
--------------------------------------------------

Issue: #4014
dolanzhao pushed a commit that referenced this issue Aug 4, 2022
When -use_physical is set, the cache and TLB simulators read the new
virtual-to-physical translation markers and use them to simulate
physical addresses.

Changes drcachesim online mode to leave addresses virtual and insert markers
instead, just like offline.  Adds a compatibility change note and updates the docs.

Includes a fix for -cpu_scheduling where the cached last thread was not
reset on a cpu change with no thread change in between.

--------------------------------------------------
Tested: Ran manually and looked at logs.
Open to suggestions for how to automate testing.

$ rm -rf drmemtrace.sim*.dir; ninja && sudo bin64/drrun -stderr_mask 0 -t drcachesim -use_physical -offline -- suite/tests/bin/simple_app && sudo chown -R $USER drmemtrace.sim*.dir && bin64/drrun -t drcachesim -indir drmemtrace.sim*.dir -use_physical -verbose 3 > OUT 2>&1
$ less OUT
translating virtual 0x7fed14de0050 to 0xf52ace050
::3036256.3036256::  @0xf52ace050 instr x3
translating virtual 0x7fed14de0053 to 0xf52ace053
::3036256.3036256::  @0xf52ace053 instr x5
translating virtual 0x7ffca0af9068 to 0xb3f975068
translating virtual 0x7fed14de0053 to 0xf52ace053
::3036256.3036256::  @0xf52ace053 write 0xb3f975068 x8

$ bin64/drrun -t drcachesim -indir drmemtrace.sim*.dir -use_physical -simulator_type TLB -verbose 3 > OUT 2>&1
$ les OUT
translating virtual 0x7f6f263a3050 to 0xf52ace050
::3080711.3080711::  @0x7070615f656c706d instr 0xf52ace050 x3
translating virtual 0x7f6f263a3053 to 0xf52ace053
::3080711.3080711::  @0x7070615f656c706d direct_call 0xf52ace053 x5
translating virtual 0x7ffdeaab3798 to 0xed4c52798
translating virtual 0x7f6f263a3053 to 0xf52ace053
--------------------------------------------------

Issue: #4014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant