Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#2974 trace support for AArch64, part 2: added trace implementation #5045

Merged
merged 18 commits into from
Sep 1, 2021

Conversation

Vincent-lau
Copy link
Collaborator

This patch incorporated changes from patch #2442, and fixed some corner cases
where the assumption was incorrect and caused the program to crash.

Trace support is not yet enabled by default, but can be enabled with "-enable_traces".

A large part of this work was based on work done by Kevin Zhou. Please see branch
i1569-trace and patch #2442 for the original version.

Issues: #1569, #2974

…part of support

This patch incorporated changes from patch #2442, and fixed some corner cases
where the assumption was incorrect and caused the program to crash.

Trace support is not yet enabled by default, but can be enabled with "-enable_traces".

A large part of this work was based on work done by Kevin Zhou. Please see branch
i1569-trace and patch #2442 for the original version.

Issues: #1569, #2974
@Vincent-lau
Copy link
Collaborator Author

Vincent-lau commented Aug 12, 2021

The results for the failure on Jenkins are

debug-internal-64: 242 tests passed, **** 5 tests failed, of which 1 were flaky, but ignoring 4 for i#2417: ****
	(ignore: i#2417) 	code_api|linux.fib-conflict 
	(ignore: i#2417) 	code_api|linux.fib-conflict-early 
	(ignore: i#2417) 	code_api|tool.drcachesim.phys 
	(ignore: i#2417) 	code_api|api.rseq 
	code_api|api.static_sideline_FLAKY 

====> FAILURE in release-external-64 <====
release-external-64: 48 tests passed, **** 3 tests failed, but ignoring 2 for i#2417: ****
	(ignore: i#2417) 	code_api|tool.drcachesim.phys 
	(ignore: i#2417) 	code_api|tool.drcacheoff.rseq 
	code_api|tool.drcachesim.coherence 

which I believe are flaky tests by #2417

@AssadHashmi
Copy link
Contributor

which I believe are flaky tests by #2417

Yes AIUI. We need to add that to the FLAKY_ list.

@AssadHashmi
Copy link
Contributor

run arm tests

core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
Removed all the commented code and an unused parameter.

Issues: #1569, #2974
@abhinav92003
Copy link
Contributor

Thanks for making the changes. Will take a look today

@Vincent-lau
Copy link
Collaborator Author

Thanks for making the changes. Will take a look today

I just noticed dcontext in check_patch() is also redundant, I will change it along with any other changes that you might propose.

Copy link
Contributor

@abhinav92003 abhinav92003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mid review comments.

core/arch/emit_utils_shared.c Outdated Show resolved Hide resolved
core/arch/interp.c Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/aarch64/emit_utils.c Show resolved Hide resolved
Some clean up according to the review. Also changing the implementation
of check_patched_ibl to merge two checks into one function.

Issues: #1569, #2974
core/arch/interp.c Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Show resolved Hide resolved
core/arch/interp.c Show resolved Hide resolved
core/arch/interp.c Show resolved Hide resolved
@abhinav92003
Copy link
Contributor

Thanks for making the previous changes. In most of my recent comments, I'm just trying to understand some parts of trace building that I'm not familiar with yet, and maybe help improve inline documentation along the way.

@Vincent-lau
Copy link
Collaborator Author

Vincent-lau commented Aug 19, 2021

Thanks for all the comments. as I said, the original author of this code was not me, and I have to guess the intent of the author (just as you do). But I have replied to your review with my thoughts.

Replaced some of the if-conditions with assertions and improved code
comments.

Issues: #1569, #2974
@Vincent-lau
Copy link
Collaborator Author

Vincent-lau commented Aug 23, 2021

One other thing, I am thinking about making the trace support for AArch64 on by default (after a few more iterations), I am wondering roughly how much work would be needed for that? I would imagine that the bare minimum is to pass all current tests with trace on by default and we might need to close the issue #5057 as well?

core/arch/interp.c Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
@abhinav92003
Copy link
Contributor

One other thing, I am thinking about making the trace support for AArch64 on by default (after a few more iterations), I am wondering roughly how much work would be needed for that? I would imagine that the bare minimum is to pass all current tests with trace on by default and we might need to close the issue #5057 as well?

Atleast some targeted test coverage would be good, yes. In addition to this, we'd need to run some large internal apps/benchmarks under DR with traces enabled. In the past, we've seen that some issues show up only on large apps, not on unit tests or SPEC.

Changed the condition of checking for mangled stolen register to include
the ldr instruction. Added comment on clarifying the need to iterate over the
entire trace to find all indirect exits.

Issues: #1569, #2974
@Vincent-lau
Copy link
Collaborator Author

Vincent-lau commented Aug 24, 2021

Atleast some targeted test coverage would be good, yes.

I am thinking about adding tests to cover some of the cases introduced in this PR, but not quite sure how to get started. My initial thought is to write a basic block such as

br  %x3
b  ibl_lookup_routine

And it would be mangled into

str %x2 -> IBL_TARGET_SLOT
mov %x3 -> %x2
b ibl_lookup_routine

And then we expect it to be fixed up as

str  %x0 -> TLS_REG0_SLOT
mov #trace_next_target -> %x0
eor %x0 %x3 -> %x0
cbnz %x0 trace_exit
ldr TLS_REG0_SLOT -> %x0

So that we are covering one of the cases introduced by mangle_indirect_branch_in_trace

I problem I have, though, with this idea is that I am not sure if it is actually possible for the compiler to generate such assembly instructions in a bb. On the other hand, if I write tests in C, then I can be sure that it is legitimate code generated by the compiler but can't be sure that it will cover the cases introduced by this PR.

@Vincent-lau
Copy link
Collaborator Author

we'd need to run some large internal apps/benchmarks under DR with traces enabled

Can you suggest some applications to run?

@abhinav92003
Copy link
Contributor

I am thinking about adding tests to cover some of the cases introduced in this PR, but not quite sure how to get started. My initial thought is to write a basic block such as

br  %x3
b  ibl_lookup_routine

Note that the b ibl_lookup_routine will not be present in the test app itself; it will be added by DR.

I problem I have, though, with this idea is that I am not sure if it is actually possible for the compiler to generate such assembly instructions in a bb. On the other hand, if I write tests in C, then I can be sure that it is legitimate code generated by the compiler but can't be sure that it will cover the cases introduced by this PR.

You can add your test basic blocks in assembly, similar to drreg tests that are also written in assembly to verify behaviour for some hand-crafted basic blocks. See drreg-test.c (the test app for drreg) and drreg-test.dll.c (the test client for drreg).

I guess you can use dr_register_trace_event and dr_register_end_trace_event to register callbacks for trace events, and in those callbacks you can verify the instructions in the trace. I have never looked at a trace closely myself, so please verify how the traces look on these events.

@Vincent-lau
Copy link
Collaborator Author

Vincent-lau commented Aug 25, 2021

I guess you can use dr_register_trace_event and dr_register_end_trace_event to register callbacks for trace events, and in those callbacks you can verify the instructions in the trace

Thanks for the suggestions. I had a look at these register_events methods and they seem to me to be client-side, and DR does not seem to have methods that expose the fragment to the client (which is the transparency of DR I suppose?). As most of the modifications made by the code is to transform the bb so that it can be emitted into the code cache correctly, these clients do not help much. I guess my question is if there are any APIs on the client side that can inspect the structure of a fragment as it gets emitted or if there is any way for a client to look at instructions modified by DR? If not, then we probably need to think of other ways to do the test.

I might suggest writing some tests similar to what null_instrument.c did, which is to test if a trace is correctly created if some bbs are executed frequently enough. For example, we can create some control flow and execute one condition for some time and the other for some time, and see if they have traces created respectively. This test can be useful for x86 as well, so I might start working on that.

Copy link
Contributor

@abhinav92003 abhinav92003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at these register_events methods and they seem to me to be client-side, and DR does not seem to have methods that expose the fragment to the client

With dr_register_trace_event you can register a callback that takes the trace as input. The documentation also states that
"trace is a pointer to the list of instructions that comprise the trace." and " The user is free to inspect and modify the non control-flow instructions in the trace before it executes, with certain restrictions". Are you saying that the given trace is not the one after the trace creation code runs?

void dr_register_trace_event(dr_emit_flags_t (*func)(void *drcontext, void *tag,
                                                     instrlist_t *trace,
                                                     bool translating));

* The user is free to inspect and modify the non-control-flow

core/arch/interp.c Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Show resolved Hide resolved
core/arch/interp.c Show resolved Hide resolved
core/arch/aarch64/emit_utils.c Show resolved Hide resolved
@abhinav92003
Copy link
Contributor

@Vincent-lau I think it should be okay if we're not able to add tests in this PR itself. That can be done as follow-up in #5057.

@derekbruening We iterated on this implementation and I think the overall approach makes sense. Can you take a look too? To make sure that I'm not missing any details or if anything can be simplified. In addition, I've also tagged you in some comments where we could use your input.

core/arch/aarch64/emit_utils.c Show resolved Hide resolved
core/arch/aarch64/emit_utils.c Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
core/arch/interp.c Outdated Show resolved Hide resolved
* If we delete all code after "b target", then the "fall" path would
* be lost. Therefore we need to append the fall path at the end of
* the trace as a fake exit stub. Swapping them might be dangerous since
* a stub trace may be created on both paths.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we switch to #5062 and append unmangled instrlists and only mangle at the very end, all this complexity would go away, right?

Copy link
Collaborator Author

@Vincent-lau Vincent-lau Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only mangle at the very end

So you mean that we build a trace from scratch and only mangle when we have all the bbs stitched together? I think this would indeed remove this complexity.

I might bring up one other problem related to this as well, in translate.c the function recreate_app_state_from_ilist searches the recreated ilist for the instruction that has the corresponding app pc of the current mc->pc. With the this new patch adding appending instructions on the miss path of the cbnz x0, trace_exit_label:

 * For example, the code will be fixed to:
 *      eor x0, x0, jump_target
 *      cbnz x0, trace_exit_label
 *      ...
 *      b trace_head
 * trace_exit_label:
 *      ldr x0, TLS_REG0_SLOT
 *      str x2, TLS_REG2_SLOT
 *      mov x2, jump_target
 *      b ibl_routine
 */

The recreate_app_state_from_ilist is not able to find the instr if the pc happens to be in one of the instructions following trace_exit_label and since these instructions are not identified as a exit stub, the mc->pc wouldn't be pointing to the correct cti either, and as they don't belong to the application either, so won't appear in the recreated ilist, recreate_app_state_from_ilist is not able to find the correct instruction and reaches the ASSERT_NOT_REACHED on L1114.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean that we build a trace from scratch and only mangle when we have all the bbs stitched together? I think this would indeed remove this complexity.

Yes. We already have the functionality in mangle_trace() to do all the bb stitching at the very end: we have to do that for clients that change the trace in the trace event (the event passes them the unmangled instrlist).

Are you interested in implementing #5062? It would be a nice cleanup and simplify this PR which would then come in afterward.

Copy link
Contributor

@derekbruening derekbruening Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recreate_app_state_from_ilist is not able to find the instr if the pc happens to be in one of the instructions following trace_exit_label

That's a blocking issue that has to be solved.

One proposal is to change the i.b. comparison to use x2 instead of x0 as the temp reg holding the expected target immediate. Then the only thing missing is copying from the actual target reg into x2, and maybe that could be done locally. It would use a taken forward branch on the match which maybe is not ideal but certainly simple:

     str x2, TLS_REG2_SLOT
     mov x2, #trace_next_target
     eor x2, x2, jump_target
     cbz x2, match
     mov x2, jump_target
     b ibl_routine_regular_exit_stub
match:
     ldr x2, TLS_REG2_SLOT

Then we eliminate this special-case trace_exit_label.

(If jump_target == x2 then a different scheme needs to be used -- might require 2 spills b/c we want to put the app value in the x2 slot and have a local scratch.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that whatever mangling is added, translate.c needs to be updated to recognize it. Basic spills and restores are already recognized but anything else may need special support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would happen if target_cache is pointing to one of these mangled instructions

The uncompleted mangled app instruction will be re-executed, so all state changes have to be rolled back: restore spilled registers, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you interested in implementing #5062?

How long do you estimate this would take? I am currently on the final week of an internship where I try to port the trace support for a64. Obviously I can continue contributing after the internship but I am just trying to get a sense of how long this is going to take.

I don't know for sure -- it might be quite straightforward:

  • go enable things permanently that are currently gated on monitor_data_t.pass_to_client like extend_unmangled_ilist's work
  • change internal_extend_trace to estimate the size from the unmangled list (maybe w/ some fudge factor for mangling: for new scheme only used to keep trace below max sizes I would assume)
  • delete monitor_data_t.trace_buf, extend_trace, etc.
  • try to delete decode_fragment: check other callers

But there might be some extra things to deal with once in the details.

Maybe better to finish this PR off as a deliverable -- but the missing translation support is an issue. Could revert the optimization and have sub-optiimal code that matches x86 with the spill-and-move left there and a simple exit stub and complete translation support, with a comment that we can improve the perf later.

Copy link
Collaborator Author

@Vincent-lau Vincent-lau Aug 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uncompleted mangled app instruction will be re-executed, so all state changes have to be rolled back: restore spilled registers, etc.

In that case is it OK to include these instructions in the recreated ilist (currently they are not)

 * trace_exit_label:
 *      ldr x0, TLS_REG0_SLOT
 *      str x2, TLS_REG2_SLOT
 *      mov x2, jump_target
 *      b ibl_routine

And store the translation info in these instructions when we call fixup_indirect_trace_exit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case is it OK to include these instructions in the recreated ilist (currently they are not)

Wait -- why are they not in the recreation of the trace? That is not good. It should go through the same mangling steps.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because these instructions are added in fixup_indirect_trace_exit which is not called when recreate_fragment_ilist runs. I have added a call in recreate_fragment_ilist so that these instructions are included in the recreated instrlist as well.

@Vincent-lau
Copy link
Collaborator Author

Are you saying that the given trace is not the one after the trace creation code runs?

I will move the discussion to #5057

fixup_indirect_trace_exit

Added translation information when fixing up the indirect trace exits
and added call to fixup_indirect_trace_exit when recreating the fragment ilist

Issues: #1569, #2974
@derekbruening
Copy link
Contributor

Seeing unresolved comments: please resolve when addressed

core/arch/interp.c Outdated Show resolved Hide resolved
* If we delete all code after "b target", then the "fall" path would
* be lost. Therefore we need to append the fall path at the end of
* the trace as a fake exit stub. Swapping them might be dangerous since
* a stub trace may be created on both paths.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case is it OK to include these instructions in the recreated ilist (currently they are not)

Wait -- why are they not in the recreation of the trace? That is not good. It should go through the same mangling steps.

core/arch/interp.c Show resolved Hide resolved
* the trace as a fake exit stub. Swapping them might be dangerous since
* a stub trace may be created on both paths.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include TODO i#xxxx comments (with the real issue number) for the missing pieces: in particular, adding translation support for all the new mangling code. (Ideally it would be in this PR but for making progress here w/ a new feature in pieces ok to not have all at once so long as what's missing is clearly documented not just in the commit description but in the code too). I would put such comments both here and in translate.c.

Copy link
Collaborator Author

@Vincent-lau Vincent-lau Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that there isn't any translation support needed for the new mangling code. The new mangling code includes:

     * Original ibl lookup:
     *      str tgt_reg, TLS_REG2_SLOT
     *      mov tgt_reg, jump_target
     *      b ibl_routine
     * Now we rewrite it into:
     *      str x0, TLS_REG0_SLOT
     *      mov x0, #trace_next_target
     *      eor x0, x0, jump_target
     *      cbnz x0, trace_exit (ibl routine)
     *      ldr x0, TLS_REG0_SLOT
     */

and

 * For example, the code will be fixed to:
*      eor x0, x0, jump_target
*      cbnz x0, trace_exit_label
*      ...
*      b trace_head
* trace_exit_label:
*      ldr x0, TLS_REG0_SLOT
*      str x2, TLS_REG2_SLOT
*      mov x2, jump_target
*      b ibl_routine

which are just basic spilling of registers and perform some calculation after spilling, I suppose? What sort of support do you think would be needed for these new mangling instructions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I mentioned previously about recreate_fragment_ilist not being able to find an instruction was purely because the new mangled instructions added by fixup_indirect_trace_exit is not added when creating the fragment ilist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run with -stress_recreate_state and verify there are no translation failures or warnings about unknown mangling sequences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the "unsupported mangle instr" message is at DOLOG(2 -- I thought it was more visible esp with -stress_recreate_state. Please run with -loglevel 2 -stress_recreate_state and ensure there are no "unsupported mangle instr" messages in the log files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are already quite a lot of unsupported mangle instr even without trace enabled. I was not able to find any issues mentioning this so I opened a new one #5069. Feel free to point out if there are already issues that are talking about this problem.

This is surprising since I put in work earlier this year to enable -stress_recreate_state, fix a number of translation problems on aarch64, and enable a regression test: #4680. Taking a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make progress let's merge this but please first add a comment in the code and in the commit message about these separate trace_exit_label sequences breaking translation and how either special support needs to be added or (the better choice IMHO: best to avoid internal control flow) they need to be removed and a different scheme used such as the one proposed at https://github.com/DynamoRIO/dynamorio/pull/5045/files/b9ada873ecc0b86128169aae0bd7ad648c0f921b..fc70e80bad8a66b7f417c27673fa5659e96ff1ef#r697566139

Copy link
Collaborator Author

@Vincent-lau Vincent-lau Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the better choice IMHO: best to avoid internal control flow

I agree with you that a linear control flow is preferable, will add a comment in the code for trace_exit_label

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising since I put in work earlier this year to enable -stress_recreate_state, fix a number of translation problems on aarch64, and enable a regression test: #4680. Taking a look.

I am guessing it's because these instructions are just reported with a log message, which did not catch your attention while you were fixing them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall looking at such log messages -- but I think those memories are from rseq translation work #4316 and the #4680 is as you said: focusing on asserts and fatal problems.

simpler approach

Added a comment on the possibility to switch to a simpler approach of
not decoding from code cache. And changed setting translation of instructions
using INSTR_XL8 macro.

Issues: #1569, #2974
bound

Fixed a problem where the branch upper bound is calculated correctly.

Issues: #1569, #2974
core/arch/interp.c Show resolved Hide resolved
@@ -2196,6 +2196,11 @@ int
append_trace_speculate_last_ibl(dcontext_t *dcontext, instrlist_t *trace,
app_pc speculate_next_tag, bool record_translation);

#ifdef AARCH64
int
fixup_indirect_trace_exit(dcontext_t *dcontext, instrlist_t *trace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below: long-term this should only be inside mangle_trace() and so this would be removed when #5062 is implemented. Please add a XXX i#5062 comment about this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of 'here', maybe mention end_and_emit_trace.

* the trace as a fake exit stub. Swapping them might be dangerous since
* a stub trace may be created on both paths.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run with -stress_recreate_state and verify there are no translation failures or warnings about unknown mangling sequences.

* the trace as a fake exit stub. Swapping them might be dangerous since
* a stub trace may be created on both paths.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make progress let's merge this but please first add a comment in the code and in the commit message about these separate trace_exit_label sequences breaking translation and how either special support needs to be added or (the better choice IMHO: best to avoid internal control flow) they need to be removed and a different scheme used such as the one proposed at https://github.com/DynamoRIO/dynamorio/pull/5045/files/b9ada873ecc0b86128169aae0bd7ad648c0f921b..fc70e80bad8a66b7f417c27673fa5659e96ff1ef#r697566139

core/arch/aarch64/emit_utils.c Outdated Show resolved Hide resolved
linear control flow

Currently fixup_indirect_trace_exit adds a special trace_exit_label that
breaks the linear control flow which is assumed by many places in the code. We need to
consider an alternative scheme to avoid this problem or add special support for the
trace_exit_label.

Also reverted changes on incorrect upper branch bound calculation.

Issues: #1569, #2974
@Vincent-lau
Copy link
Collaborator Author

Vincent-lau commented Sep 1, 2021

It seems that I need to add a lot in the final commit message 😄 , might as well include it here before I merge this into the master branch just in case I forgot something.

i#2974 trace support for AArch64, part 2: implementation of the main part of support 

This patch incorporated changes from patch #2442, and fixed some corner cases
where the assumption was incorrect and caused the program to crash.

Trace support is not yet enabled by default, but can be enabled with "-enable_traces".

This commit introduces internal control flow by adding a trace_exit_label in
fixup_indirect_trace_exit, which might break code that assumes linear control
flow. Either special support is needed for this trace_exit_label or alternative
schemes using linear control.

Co-authored-by: Kevin Zhou <[email protected]>

Issues: #1569, #2974

Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, and taking the time to go through what ended up being a long review process!

@@ -370,14 +370,14 @@ patch_branch(dr_isa_mode_t isa_mode, cache_pc branch_pc, cache_pc target_pc,
ASSERT(ALIGNED(branch_pc, 4) && ALIGNED(target_pc, 4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i#2974 trace support for AArch64, part 18: Adding comment on preferring
linear control flow

Note that the ", part N: " is a convention for merges to master: i.e., this PR being merged will be part 1; the next PR for trace building merged into master would be part 2 (not part 19 or sthg: the PR-internal ones don't count toward the master merges).

Copy link
Collaborator Author

@Vincent-lau Vincent-lau Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that's why I put part 2 in my final commit message above. Have a look if you want and see if there is anything missing or not quite right.

@Vincent-lau
Copy link
Collaborator Author

Thank you for the PR, and taking the time to go through what ended up being a long review process!

No problem at all, really appreciated your (and other reviewer's) speedy reply!

Copy link
Contributor

@abhinav92003 abhinav92003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion and addressing all comments!

@@ -2196,6 +2196,11 @@ int
append_trace_speculate_last_ibl(dcontext_t *dcontext, instrlist_t *trace,
app_pc speculate_next_tag, bool record_translation);

#ifdef AARCH64
int
fixup_indirect_trace_exit(dcontext_t *dcontext, instrlist_t *trace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of 'here', maybe mention end_and_emit_trace.

@Vincent-lau Vincent-lau merged commit 8469b44 into master Sep 1, 2021
@Vincent-lau Vincent-lau deleted the i2974-trace branch September 1, 2021 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants