-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cross-architecture drcachesim trace analysis and postprocessing support #4318
Comments
derekbruening
added a commit
that referenced
this issue
Jun 18, 2020
Adds support for building with different host and target architectures. The goal is to support running drdisas, drdecode, drraw2trace, and drcachesim trace analyzers on one host machine while targeting recorded bytes or traces from a different type of machine: e.g., processing aarch64 traces on an x86_64 machine. The goal is *not* to turn DR into a cross-ISA binary translator: we only support standalone mode here, not full DR managed mode. Long-term it would be nicer to split out the DR standalone mode interfaces into separate libraries but that is beyond the scope of the current work. Adds a top-level CMake option variable "TARGET_ARCH" which defaults to CMAKE_SYSTEM_PROCESSOR but can be set to a different value. The regular arch variables/defines X86, AARCH64, and ARM are set from TARGET_ARCH, while new values DR_HOST_{X86,AARCH64,ARM} are set for the host. The bulk of the code is built targeting the TARGET_ARCH, with only assembly code and other limited code using the DR_HOST_* arch. This limits the scope of the code changes as compared to the opposite approach. A new define DR_HOST_NOT_TARGET is used to simply disable cases where the host and target and intertwined and difficult to easily separate, such as TLS handling. The key code changed to use DR_HOST_* includes: + *.asm files + arch_exports.h inline asm for atomics and other key utilities + system call defines (we don't want to have our raw syscall wrappers running incorrect syscall numbers: we'll end up with fork bombs or other madness) Code using built-in __*__ arch defines is changed to use our defines: + sigcontext.h, which is also augmented for some missing AArch64 types + x86_code_test.c Code reading ELF headers is relaxed to allow *any* arch, since we need the host for standalone init code but the target for raw2trace module mapping. Nearly all tests are disabled not just from running but also from building for simplicity, to reduce code changes here. TODO: Add new runsuite build TODO: Add new drcachesim xarch tests TODO: A couple of NOCHECK's in the diff Issue: #1684, #4318
derekbruening
added a commit
that referenced
this issue
Jun 22, 2020
Adds a new option -alt_module_dir which specifies an alternate directory in which to look for libraries in a modules.log offline drcachesim file. Hooks the option up to the opcode_mix and view tools and to raw2trace (both its standalone launcher and drcachesim frontend integration). This will be useful in general, but is needed for adding tests with checked-in raw files and binaries, where the checked-in modules.log has a hardcoded path that will never match all test environments. Adds a test that runs both raw2trace and opcode_mix on a new checked-in aarch64 raw trace (this test will evolve for future cross-arch testing, which is the reason for using aarch64). The binaries are stripped, with the two libraries "fake" binaries: libmemtrace.so is not executed, and libdynamorio.so is only used for dynamorio_sigreturn, so these are synthetic binaries (created from asm files) to keep them smaller. The tests deliberately fail on x86 and match an error message that will improve when architecture tags are added. Issue: #4318
derekbruening
added a commit
that referenced
this issue
Jun 23, 2020
Adds architecture tags to raw traces, propagated to final traces in a new marker type. Adds reader support for the new type. Adds a check for unhandled architectures to raw2trace. Adds checks for unhandled architectures to the opcode_mix and view tools. To reach that point, relaxes the ELF header checks to allow libraries of the same bitwidth but different architectures (i#1345 prevents allowing all libraries), so we can map in the other-arch libraries. Updates the existing aarch64 raw trace files to contain architecture tags, and updates the x86 failure message. Cross-architecture handling of these same files was tested manually with a prototype which will be committed in the future. Issue: #4318, #1345
derekbruening
added a commit
that referenced
this issue
Jun 23, 2020
Adds a new gzip wrapper around std::istream to support compressing .raw files. This lets us easily shrink checked-in test files, and aids other relocate-then-process situations with raw files. Updates the altbin test by compressing its .raw files. Issue: #4318
derekbruening
added a commit
that referenced
this issue
Jun 23, 2020
Adds a new option -alt_module_dir which specifies an alternate directory in which to look for libraries in a modules.log offline drcachesim file. Hooks the option up to the opcode_mix and view tools and to raw2trace (both its standalone launcher and drcachesim frontend integration). This will be useful in general, but is needed for adding tests with checked-in raw files and binaries, where the checked-in modules.log has a hardcoded path that will never match all test environments. Adds a test that runs both raw2trace and opcode_mix on a new checked-in aarch64 raw trace (this test will evolve for future cross-arch testing, which is the reason for using aarch64). The binaries are stripped, with the two libraries "fake" binaries: libmemtrace.so is not executed, and libdynamorio.so is only used for dynamorio_sigreturn, so these are synthetic binaries (created from asm files) to keep them smaller. The test deliberately fails on x86 and matches an error message that will improve when architecture tags are added. The test is limited to UNIX b/c of UNIX-format test data. Issue: #4318
derekbruening
added a commit
that referenced
this issue
Jun 24, 2020
Adds architecture tags to raw traces, propagated to final traces in a new marker type. Adds reader support for the new type. Adds a check for unhandled architectures to raw2trace. Adds checks for unhandled architectures to the opcode_mix and view tools. To reach that point, relaxes the ELF header checks to allow libraries of the same bitwidth but different architectures (i#1345 prevents allowing all libraries), so we can map in the other-arch libraries. Updates the existing aarch64 raw trace files to contain architecture tags, and updates the x86 failure message. Cross-architecture handling of these same files was tested manually with a prototype which will be committed in the future. Issue: #4318, #1345
derekbruening
added a commit
that referenced
this issue
Jun 24, 2020
Adds a new gzip wrapper around std::istream to support compressing .raw files. This lets us easily shrink checked-in test files, and aids other relocate-then-process situations with raw files. Updates the altbin test by compressing its .raw files. Issue: #4318
derekbruening
added a commit
that referenced
this issue
Jun 24, 2020
Adds a new option -alt_module_dir which specifies an alternate directory in which to look for libraries in a modules.log offline drcachesim file. Hooks the option up to the opcode_mix and view tools and to raw2trace (both its standalone launcher and drcachesim frontend integration). This will be useful in general, but is needed for adding tests with checked-in raw files and binaries, where the checked-in modules.log has a hardcoded path that will never match all test environments. Adds a test that runs both raw2trace and opcode_mix on a new checked-in aarch64 raw trace (this test will evolve for future cross-arch testing, which is the reason for using aarch64). The binaries are stripped, with the two libraries "fake" binaries: libmemtrace.so is not executed, and libdynamorio.so is only used for dynamorio_sigreturn, so these are synthetic binaries (created from asm files) to keep them smaller. The tests deliberately fail on x86 and match an error message that will improve when architecture tags are added. Issue: #4318
derekbruening
added a commit
that referenced
this issue
Jun 24, 2020
Adds architecture tags to raw traces, propagated to final traces in a new marker type. Adds reader support for the new type. Adds a check for unhandled architectures to raw2trace. Adds checks for unhandled architectures to the opcode_mix and view tools. To reach that point, relaxes the ELF header checks to allow libraries of the same bitwidth but different architectures (i#1345 prevents allowing all libraries), so we can map in the other-arch libraries. Updates the existing aarch64 raw trace files to contain architecture tags, and updates the x86 failure message. Cross-architecture handling of these same files was tested manually with a prototype which will be committed in the future. Issue: #4318, #1345
derekbruening
added a commit
that referenced
this issue
Jun 24, 2020
Adds a new gzip wrapper around std::istream to support compressing .raw files. This lets us easily shrink checked-in test files, and aids other relocate-then-process situations with raw files. Updates the altbin test by compressing its .raw files. Issue: #4318
Pasting from PR #4325 : Here's the key test in the new Travis job for drcachesim:
So it's x86 drraw2trace and opcode_mix analyzers operating on aarch64 traces. Pretty cool. |
derekbruening
added a commit
that referenced
this issue
Jun 26, 2020
Adds support for building with different host and target architectures. The goal is to support running drdisas, drdecode, drraw2trace, and drcachesim trace analyzers on one host machine while targeting recorded bytes or traces from a different type of machine: e.g., processing aarch64 traces on an x86_64 machine. The goal is *not* to turn DR into a cross-ISA binary translator: we only support standalone mode here, not full DR managed mode. Long-term it would be nicer to split out the DR standalone mode interfaces into separate libraries but that is beyond the scope of the current work. Adds a top-level CMake option variable "TARGET_ARCH" which defaults to CMAKE_SYSTEM_PROCESSOR but can be set to a different value. The regular arch variables/defines X86, AARCH64, and ARM are set from TARGET_ARCH, while new values DR_HOST_{X86,AARCH64,ARM} are set for the host. Uses a target-centric approach, where the bulk of the code is built targeting the TARGET_ARCH, with only assembly code and other limited code using the DR_HOST_* arch. This limits the scope of the code changes as compared to the opposite host-centric approach. A new define DR_HOST_NOT_TARGET is used to simply disable cases where the host and target are intertwined and difficult to easily separate, such as TLS handling and assembly code used for application execution (such as icache_op_isb_asm). The key code changed to use DR_HOST_* includes: + *.asm files + arch_exports.h inline asm for atomics and other key utilities + system call defines (we don't want to have our raw syscall wrappers running incorrect syscall numbers: we'll end up with fork bombs or other madness) + injection code + data section setup + drsyms libraries Code using built-in __*__ arch defines is changed to use our defines: + sigcontext.h, which is also augmented for some missing AArch64 types + x86_code_test.c Code reading ELF headers is relaxed to allow *any* arch of the same bitwdith (i#1345 prevents different bitwidth), since we need the host for standalone init code but the target for raw2trace module mapping. Nearly all tests are disabled not just from running but also from building for simplicity, to reduce code changes here. Some api.* decoder tests, drdisas, and the altbin test for drcachesim post-processing and analysis are enbabled in host!=target builds. Tests are disabled by using a macro to set up the tests and shifting all the setup code prior to the tests themselves, allowing a return() to skip over the bulk of the tests to avoid a giant if(). Only Linux host-x86_64-target-aarch64 is explicitly tested and enabled here. Adding the inverse, and adding host-i386-target-arm32, should not be much work, mostly with compiler flags, but will be done separately. Adding Windows support is also separated out as it requires a number of changes for compiler problems with aarch64 code. Adds a new host-x86_64-target-aarch64 build to runsuite.cmake, allocated to a new Travis job to avoid slowing down the current jobs. Issue: #1684, #4318, #1345
derekbruening
added a commit
that referenced
this issue
Jun 30, 2020
PR #4331 commit 5d49675 added a new marker with the architecture early in a trace, enough to break the file_reader_t assumptions. The file reader was updated, but it's best to also bump the version for old binary code. Additionally, we provide a separate distinct message about a version mismatch (otherwise the error message from the old reader is the same as without the version bump). Issue: #4318
derekbruening
added a commit
that referenced
this issue
Jun 30, 2020
) PR #4331 commit 5d49675 added a new marker with the architecture early in a trace, enough to break the file_reader_t assumptions. The file reader was updated, but it's best to also bump the version for old binary code. Additionally, we provide a separate distinct message about a version mismatch (otherwise the error message from the old reader is the same as without the version bump). Issue: #4318
derekbruening
added a commit
that referenced
this issue
Jul 6, 2020
Moves the -alt_module_dir for finding binaries during trace post-processing to take precedence over the recorded path, to allow avoiding problems when an identical-seeming path exists on the processing machine (e.g., system libraries) on UNIX where we have no checksum or other library distinguisher. Tested manually on a case where a system library in the same path as on the recorded machine caused failure. Issue: #4318
derekbruening
added a commit
that referenced
this issue
Jul 7, 2020
Moves the -alt_module_dir for finding binaries during trace post-processing to take precedence over the recorded path, to allow avoiding problems when an identical-seeming path exists on the processing machine (e.g., system libraries) on UNIX where we have no checksum or other library distinguisher. Tested manually on a case where a system library in the same path as on the recorded machine caused failure. Issue: #4318
derekbruening
added a commit
that referenced
this issue
Jul 8, 2020
When raw files are gzipped and named xxx.raw.gz, the corresponding final trace files end up named xxx.raw.trace.gz, i.e., with an extra "raw." in there. We fix that here. Issue: #4318
derekbruening
added a commit
that referenced
this issue
Jul 8, 2020
When raw files are gzipped and named xxx.raw.gz, the corresponding final trace files end up named xxx.raw.trace.gz, i.e., with an extra "raw." in there. We fix that here. Issue: #4318
derekbruening
added a commit
that referenced
this issue
Jul 13, 2020
For the view tool, the original wrong-arch check was placed *after* the entry skipping for -skip_refs, which can result in never checking the architecture. We fix that here by moving the check up front. Issue: #4318
derekbruening
added a commit
that referenced
this issue
Jul 13, 2020
For the view tool, the original wrong-arch check was placed *after* the entry skipping for -skip_refs, which can result in never checking the architecture. We fix that here by moving the check up front. Issue: #4318
derekbruening
added a commit
that referenced
this issue
Sep 16, 2020
derekbruening
added a commit
that referenced
this issue
Sep 29, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It is useful to be able to analyze and post-process drcachesim offline traces gathered on one architecture on a different platform: e.g., to trace an ARM application and pull the trace onto an x86 machine for analysis.
Most of the analysis code is already cross-platform, but the decoder for tools like opcode_mix is the key missing piece. We would need something like #1684 to continue using DR's decoder there, but we'd need it in full DR and not just drdecodelib (because raw2trace uses full DR and it is complex to downgrade it).
The text was updated successfully, but these errors were encountered: