-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instruction encoding entries to drmemtrace #5520
Comments
Noting two use cases which could be handled differently:
|
Including the instruction encodings would increase trace files sizes by an estimated 75%. |
Adds a new trace_entry_t type TRACE_TYPE_ENCODING. Encodings are stored in one or more consecutive such entries by raw2trace for the first occurrence of each instruction or modified instruction, with encodings repeated in new chunks. The trace version is bumped for this new type. Adds a new encoding field to the end of memref_instr_t. The reader_t class caches the trace_entry_t encodings and uses them to fill in this field. A new file type OFFLINE_FILE_TYPE_ENCODINGS indicates whether encodings are present (to support stripping them out). Augments the opcode_mix and view tools to use the new encoding records and only need the module_mapper for legacy traces. Updates the documentation and unit tests. Still remaining is adding markers when code changes, joint with #2062. Fixes #5520
Adds a new trace_entry_t type TRACE_TYPE_ENCODING. Encodings are stored in one or more consecutive such entries by raw2trace for the first occurrence of each instruction or modified instruction, with encodings repeated in new chunks. The trace version is bumped for this new type. Adds a new encoding field to the end of memref_instr_t. The reader_t class caches the trace_entry_t encodings and uses them to fill in this field. A new file type OFFLINE_FILE_TYPE_ENCODINGS indicates whether encodings are present (to support stripping them out). Augments the opcode_mix and view tools to use the new encoding records and only need the module_mapper for legacy traces. Updates the documentation and unit tests. Still remaining is adding markers when code changes, joint with #2062. Fixes #5520
There are actually a couple of unfinished things:
|
Adds a new field to memref_instr_t: encoding_is_new. This indicates whether the encoding bytes for an instruction fetch are new, either due to changed application code or just the start of a new chunk. This lets tools know when to invalidate any cached decoding information. Updates the opcode_mix and view tools to use this to invalidate their caches. Adds a test of modified code to the tool.drcacheoff.gencode test. Changes the test to use opcode_mix to test its cache invalidation. Adds a new icache_sync() routine to tools.h for help modifying code on AArchXX. I verified that with the bug pointed out at #5662 (comment) the test catches the stale opcodes used by opcode_mix. Issue: #5520
Adds a new field to memref_instr_t: encoding_is_new. This indicates whether the encoding bytes for an instruction fetch are new, either due to changed application code or just the start of a new chunk. This lets tools know when to invalidate any cached decoding information. Updates the opcode_mix and view tools to use this to invalidate their caches. Adds a test of modified code to the tool.drcacheoff.gencode test. Changes the test to use opcode_mix to test its cache invalidation. Adds a new icache_sync() routine to tools.h for help modifying code on AArchXX. I verified that with the bug pointed out at #5662 (comment) the test catches the stale opcodes used by opcode_mix. Issue: #5520
Adds support for Thumb versus Arm modes in AArch32 with respect to drmemtrace stored encodings by setting the LSB when storing in the tracer and raw2trace, and by properly setting the base mode to Arm when decoding. Augments decode_from_copy() to switch locally to Thumb if either the read or original PC has LSB=1 as that better fits most use cases (including reading from a drmemtrace encoding buffer). Adds AArch32 support to the burst_gencode test. Fixes 3 outstanding AArch32 issues: + Removes a quote in a comment in third_party/libgcc/arm/lib1funcs.S code not handled by old toolchains. + Removes an ASSERT_NOT_TESTED path now hit. + Fixes an AArch32 tracer bug where a 2nd temp register is needed for a 2nd conditional skip. Tested by running the drcacheoff.gencode test on an AArch32 machine which now passes. Issue: #5520
Although the encoding records are not directly exposed to analysis tools, the encoding_is_new field can be used as a proxy to count them. We do that here for the basic_counts tool. Issue: #5520
Updates the x86_64 and aarch64 traces to the new .zip format with embedded instruction encodings. Removes the .raw directories entirely and the .so binaries as they are no longer needed. Updates all the READMEs for the new traces and for no longer needing library support for mapping in binaries for decoding. Issue: DynamoRIO/dynamorio#5520
Fixes the incorrect attempt by analyzer_interface to synthesize a modules.log directory when running opcode_mix, which failes when there is no raw directory. Modifies the opcode_mix offline test to remove the raw directory as a sanity check that the embedded instruction encodings are indeed all we need. Issue: #5520
Although the encoding records are not directly exposed to analysis tools, the encoding_is_new field can be used as a proxy to count them. We do that here for the basic_counts tool. Issue: #5520
Fixes the incorrect attempt by analyzer_interface to synthesize a modules.log directory when running opcode_mix, which failes when there is no raw directory. Modifies the opcode_mix offline test to remove the raw directory as a sanity check that the embedded instruction encodings are indeed all we need. Issue: #5520
Adds support for Thumb versus Arm modes in AArch32 with respect to drmemtrace stored encodings by setting the LSB when storing in the tracer and raw2trace, and by properly setting the base mode to Arm when decoding. Augments decode_from_copy() to switch locally to Thumb if either the read or original PC has LSB=1 as that better fits most use cases (including reading from a drmemtrace encoding buffer). Adds AArch32 support to the burst_gencode test. Fixes 3 outstanding AArch32 issues: + Removes a quote in a comment in third_party/libgcc/arm/lib1funcs.S code not handled by old toolchains. + Removes an ASSERT_NOT_TESTED path now hit. + Fixes an AArch32 tracer bug where a 2nd temp register is needed for a 2nd conditional skip. Tested by running the drcacheoff.gencode test on an AArch32 machine which now passes. Issue: #5520
Adds explicit mention of the encoding_is_new field to the memtrace format docs. Updates the Core Simulation section: mentions encodings again, and removes the stale module_mapper paragraph. Issue: #5520
Updates the x86_64 and aarch64 traces to the new .zip format with embedded instruction encodings. Removes the .raw directories entirely and the .so binaries as they are no longer needed. Updates all the READMEs for the new traces and for no longer needing library support for mapping in binaries for decoding. Issue: DynamoRIO/dynamorio#5520
Adds explicit mention of the encoding_is_new field to the memtrace format docs. Updates the Core Simulation section: mentions encodings again, and removes the stale module_mapper paragraph. Issue: #5520
Adds a check that the memref.instr.encoding_is_new flag is valid in the basic_counts tool. Issue: #5520
) Adds a check that the memref.instr.encoding_is_new flag is valid in the basic_counts tool. Issue: #5520
Adds a new option -instr_encodings which enables instruction encoding records for online traces. This is under this off-by-default option as it adds significant overhead to online tracing and is only needed for some tools. Updates the launcher and docs. Adds an online opcode_mix test and adds encodings to a basic_counts online test. Manually tested the view tool with and without -instr_encodings: $ bin64/drrun -t drcachesim -instr_encodings -simulator_type view -- suite/tests/bin/allasm_x86_64 ... 166: T1718245 ifetch 4 byte(s) @ 0x0000000000401028 48 83 eb 01 sub $0x0000000000000001 %rbx -> %rbx 167: T1718245 ifetch 4 byte(s) @ 0x000000000040102c 48 83 fb 00 cmp %rbx $0x0000000000000000 168: T1718245 ifetch 2 byte(s) @ 0x0000000000401030 75 d9 jnz $0x000000000040100b ... $ bin64/drrun -t drcachesim -simulator_type view -- suite/tests/bin/allasm_x86_64 ... 126: T1933188 ifetch 4 byte(s) @ 0x0000000000401028 non-branch 127: T1933188 ifetch 4 byte(s) @ 0x000000000040102c non-branch 128: T1933188 ifetch 2 byte(s) @ 0x0000000000401030 conditional jump ... Fixes #5520
Pretty much everything is done except the initial online encoding solution is imperfect:
|
Adds a new option -instr_encodings which enables instruction encoding records for online traces. This is under this off-by-default option as it adds significant overhead to online tracing and is only needed for some tools. Currently we emit the encoding again for every dynamic instance of an instruction. Future work involves recording which we've emitted to avoid duplicate instances (the reader caches prior encodings) but this requires careful consideration of global locks and per-thread invalidation on code changes which is out of the scope of this initial implementation. Updates the launcher and docs. Adds an online opcode_mix test and adds encodings to a basic_counts online test. Manually tested the view tool with and without -instr_encodings: $ bin64/drrun -t drcachesim -instr_encodings -simulator_type view -- suite/tests/bin/allasm_x86_64 ... 166: T1718245 ifetch 4 byte(s) @ 0x0000000000401028 48 83 eb 01 sub $0x0000000000000001 %rbx -> %rbx 167: T1718245 ifetch 4 byte(s) @ 0x000000000040102c 48 83 fb 00 cmp %rbx $0x0000000000000000 168: T1718245 ifetch 2 byte(s) @ 0x0000000000401030 75 d9 jnz $0x000000000040100b ... $ bin64/drrun -t drcachesim -simulator_type view -- suite/tests/bin/allasm_x86_64 ... 126: T1933188 ifetch 4 byte(s) @ 0x0000000000401028 non-branch 127: T1933188 ifetch 4 byte(s) @ 0x000000000040102c non-branch 128: T1933188 ifetch 2 byte(s) @ 0x0000000000401030 conditional jump ... Fixes #5520
Adds a "News" line to the drmemtrace docs with a link to a new pdf of slides explaining the new embedded instruction encodings along with a preview of the forthcoming seek feature. Issue: #5520
Adds a "News" line to the drmemtrace docs with a link to a new pdf of slides explaining the new embedded instruction encodings along with a preview of the forthcoming seek feature. Issue: #5520
We have perma-links to redirect our html pages to the root directory, but other files are in the docs/ directory. Issue: #5520
We have perma-links to redirect our html pages to the root directory, but other files are in the docs/ directory. Issue: #5520
Today, the drmemtrace format does not include instruction encodings or even opcodes beyond identifying branch types. This is sufficient for functional cache simulation, as in drcachesim, but for core simulation the operand dependencies and ideally the opcodes are needed. The solution today is to preserve the application binaries for the executable and libraries, with support for mapping them in during trace analysis in order to decode the instructions (using the same support built for post-processing offline traces). Adding the encodings into the trace proper would make the trace self-contained. It is also the simplest solution for supporting dynamically-generated code that has no binary to query later: #2062 (comment)
The text was updated successfully, but these errors were encountered: