-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#2974 trace support for AArch64, part 2: added trace implementation #5045
Conversation
…part of support This patch incorporated changes from patch #2442, and fixed some corner cases where the assumption was incorrect and caused the program to crash. Trace support is not yet enabled by default, but can be enabled with "-enable_traces". A large part of this work was based on work done by Kevin Zhou. Please see branch i1569-trace and patch #2442 for the original version. Issues: #1569, #2974
The results for the failure on Jenkins are
which I believe are flaky tests by #2417 |
Yes AIUI. We need to add that to the |
run arm tests |
Thanks for making the changes. Will take a look today |
I just noticed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mid review comments.
Thanks for making the previous changes. In most of my recent comments, I'm just trying to understand some parts of trace building that I'm not familiar with yet, and maybe help improve inline documentation along the way. |
Thanks for all the comments. as I said, the original author of this code was not me, and I have to guess the intent of the author (just as you do). But I have replied to your review with my thoughts. |
One other thing, I am thinking about making the trace support for AArch64 on by default (after a few more iterations), I am wondering roughly how much work would be needed for that? I would imagine that the bare minimum is to pass all current tests with trace on by default and we might need to close the issue #5057 as well? |
Atleast some targeted test coverage would be good, yes. In addition to this, we'd need to run some large internal apps/benchmarks under DR with traces enabled. In the past, we've seen that some issues show up only on large apps, not on unit tests or SPEC. |
I am thinking about adding tests to cover some of the cases introduced in this PR, but not quite sure how to get started. My initial thought is to write a basic block such as
And it would be mangled into
And then we expect it to be fixed up as
So that we are covering one of the cases introduced by I problem I have, though, with this idea is that I am not sure if it is actually possible for the compiler to generate such assembly instructions in a bb. On the other hand, if I write tests in C, then I can be sure that it is legitimate code generated by the compiler but can't be sure that it will cover the cases introduced by this PR. |
Can you suggest some applications to run? |
Note that the
You can add your test basic blocks in assembly, similar to drreg tests that are also written in assembly to verify behaviour for some hand-crafted basic blocks. See drreg-test.c (the test app for drreg) and drreg-test.dll.c (the test client for drreg). I guess you can use |
Thanks for the suggestions. I had a look at these I might suggest writing some tests similar to what |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at these register_events methods and they seem to me to be client-side, and DR does not seem to have methods that expose the fragment to the client
With dr_register_trace_event
you can register a callback that takes the trace as input. The documentation also states that
"trace is a pointer to the list of instructions that comprise the trace." and " The user is free to inspect and modify the non control-flow instructions in the trace before it executes, with certain restrictions". Are you saying that the given trace
is not the one after the trace creation code runs?
void dr_register_trace_event(dr_emit_flags_t (*func)(void *drcontext, void *tag,
instrlist_t *trace,
bool translating));
dynamorio/core/lib/dr_events.h
Line 392 in 271d357
* The user is free to inspect and modify the non-control-flow |
@Vincent-lau I think it should be okay if we're not able to add tests in this PR itself. That can be done as follow-up in #5057. @derekbruening We iterated on this implementation and I think the overall approach makes sense. Can you take a look too? To make sure that I'm not missing any details or if anything can be simplified. In addition, I've also tagged you in some comments where we could use your input. |
* If we delete all code after "b target", then the "fall" path would | ||
* be lost. Therefore we need to append the fall path at the end of | ||
* the trace as a fake exit stub. Swapping them might be dangerous since | ||
* a stub trace may be created on both paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we switch to #5062 and append unmangled instrlists and only mangle at the very end, all this complexity would go away, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only mangle at the very end
So you mean that we build a trace from scratch and only mangle when we have all the bbs stitched together? I think this would indeed remove this complexity.
I might bring up one other problem related to this as well, in translate.c the function recreate_app_state_from_ilist
searches the recreated ilist for the instruction that has the corresponding app pc of the current mc->pc
. With the this new patch adding appending instructions on the miss path of the cbnz x0, trace_exit_label
:
* For example, the code will be fixed to:
* eor x0, x0, jump_target
* cbnz x0, trace_exit_label
* ...
* b trace_head
* trace_exit_label:
* ldr x0, TLS_REG0_SLOT
* str x2, TLS_REG2_SLOT
* mov x2, jump_target
* b ibl_routine
*/
The recreate_app_state_from_ilist
is not able to find the instr if the pc happens to be in one of the instructions following trace_exit_label
and since these instructions are not identified as a exit stub, the mc->pc
wouldn't be pointing to the correct cti either, and as they don't belong to the application either, so won't appear in the recreated ilist, recreate_app_state_from_ilist
is not able to find the correct instruction and reaches the ASSERT_NOT_REACHED
on L1114.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean that we build a trace from scratch and only mangle when we have all the bbs stitched together? I think this would indeed remove this complexity.
Yes. We already have the functionality in mangle_trace() to do all the bb stitching at the very end: we have to do that for clients that change the trace in the trace event (the event passes them the unmangled instrlist).
Are you interested in implementing #5062? It would be a nice cleanup and simplify this PR which would then come in afterward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recreate_app_state_from_ilist is not able to find the instr if the pc happens to be in one of the instructions following trace_exit_label
That's a blocking issue that has to be solved.
One proposal is to change the i.b. comparison to use x2 instead of x0 as the temp reg holding the expected target immediate. Then the only thing missing is copying from the actual target reg into x2, and maybe that could be done locally. It would use a taken forward branch on the match which maybe is not ideal but certainly simple:
str x2, TLS_REG2_SLOT
mov x2, #trace_next_target
eor x2, x2, jump_target
cbz x2, match
mov x2, jump_target
b ibl_routine_regular_exit_stub
match:
ldr x2, TLS_REG2_SLOT
Then we eliminate this special-case trace_exit_label.
(If jump_target == x2 then a different scheme needs to be used -- might require 2 spills b/c we want to put the app value in the x2 slot and have a local scratch.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that whatever mangling is added, translate.c needs to be updated to recognize it. Basic spills and restores are already recognized but anything else may need special support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would happen if target_cache is pointing to one of these mangled instructions
The uncompleted mangled app instruction will be re-executed, so all state changes have to be rolled back: restore spilled registers, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you interested in implementing #5062?
How long do you estimate this would take? I am currently on the final week of an internship where I try to port the trace support for a64. Obviously I can continue contributing after the internship but I am just trying to get a sense of how long this is going to take.
I don't know for sure -- it might be quite straightforward:
- go enable things permanently that are currently gated on monitor_data_t.pass_to_client like extend_unmangled_ilist's work
- change internal_extend_trace to estimate the size from the unmangled list (maybe w/ some fudge factor for mangling: for new scheme only used to keep trace below max sizes I would assume)
- delete monitor_data_t.trace_buf, extend_trace, etc.
- try to delete decode_fragment: check other callers
But there might be some extra things to deal with once in the details.
Maybe better to finish this PR off as a deliverable -- but the missing translation support is an issue. Could revert the optimization and have sub-optiimal code that matches x86 with the spill-and-move left there and a simple exit stub and complete translation support, with a comment that we can improve the perf later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uncompleted mangled app instruction will be re-executed, so all state changes have to be rolled back: restore spilled registers, etc.
In that case is it OK to include these instructions in the recreated ilist (currently they are not)
* trace_exit_label:
* ldr x0, TLS_REG0_SLOT
* str x2, TLS_REG2_SLOT
* mov x2, jump_target
* b ibl_routine
And store the translation info in these instructions when we call fixup_indirect_trace_exit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case is it OK to include these instructions in the recreated ilist (currently they are not)
Wait -- why are they not in the recreation of the trace? That is not good. It should go through the same mangling steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because these instructions are added in fixup_indirect_trace_exit
which is not called when recreate_fragment_ilist
runs. I have added a call in recreate_fragment_ilist
so that these instructions are included in the recreated instrlist as well.
I will move the discussion to #5057 |
Seeing unresolved comments: please resolve when addressed |
* If we delete all code after "b target", then the "fall" path would | ||
* be lost. Therefore we need to append the fall path at the end of | ||
* the trace as a fake exit stub. Swapping them might be dangerous since | ||
* a stub trace may be created on both paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case is it OK to include these instructions in the recreated ilist (currently they are not)
Wait -- why are they not in the recreation of the trace? That is not good. It should go through the same mangling steps.
* the trace as a fake exit stub. Swapping them might be dangerous since | ||
* a stub trace may be created on both paths. | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include TODO i#xxxx
comments (with the real issue number) for the missing pieces: in particular, adding translation support for all the new mangling code. (Ideally it would be in this PR but for making progress here w/ a new feature in pieces ok to not have all at once so long as what's missing is clearly documented not just in the commit description but in the code too). I would put such comments both here and in translate.c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that there isn't any translation support needed for the new mangling code. The new mangling code includes:
* Original ibl lookup:
* str tgt_reg, TLS_REG2_SLOT
* mov tgt_reg, jump_target
* b ibl_routine
* Now we rewrite it into:
* str x0, TLS_REG0_SLOT
* mov x0, #trace_next_target
* eor x0, x0, jump_target
* cbnz x0, trace_exit (ibl routine)
* ldr x0, TLS_REG0_SLOT
*/
and
* For example, the code will be fixed to:
* eor x0, x0, jump_target
* cbnz x0, trace_exit_label
* ...
* b trace_head
* trace_exit_label:
* ldr x0, TLS_REG0_SLOT
* str x2, TLS_REG2_SLOT
* mov x2, jump_target
* b ibl_routine
which are just basic spilling of registers and perform some calculation after spilling, I suppose? What sort of support do you think would be needed for these new mangling instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue I mentioned previously about recreate_fragment_ilist
not being able to find an instruction was purely because the new mangled instructions added by fixup_indirect_trace_exit
is not added when creating the fragment ilist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run with -stress_recreate_state and verify there are no translation failures or warnings about unknown mangling sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the "unsupported mangle instr" message is at DOLOG(2
-- I thought it was more visible esp with -stress_recreate_state. Please run with -loglevel 2 -stress_recreate_state
and ensure there are no "unsupported mangle instr" messages in the log files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are already quite a lot of unsupported mangle instr even without trace enabled. I was not able to find any issues mentioning this so I opened a new one #5069. Feel free to point out if there are already issues that are talking about this problem.
This is surprising since I put in work earlier this year to enable -stress_recreate_state, fix a number of translation problems on aarch64, and enable a regression test: #4680. Taking a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make progress let's merge this but please first add a comment in the code and in the commit message about these separate trace_exit_label sequences breaking translation and how either special support needs to be added or (the better choice IMHO: best to avoid internal control flow) they need to be removed and a different scheme used such as the one proposed at https://github.com/DynamoRIO/dynamorio/pull/5045/files/b9ada873ecc0b86128169aae0bd7ad648c0f921b..fc70e80bad8a66b7f417c27673fa5659e96ff1ef#r697566139
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the better choice IMHO: best to avoid internal control flow
I agree with you that a linear control flow is preferable, will add a comment in the code for trace_exit_label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surprising since I put in work earlier this year to enable -stress_recreate_state, fix a number of translation problems on aarch64, and enable a regression test: #4680. Taking a look.
I am guessing it's because these instructions are just reported with a log message, which did not catch your attention while you were fixing them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2196,6 +2196,11 @@ int | |||
append_trace_speculate_last_ibl(dcontext_t *dcontext, instrlist_t *trace, | |||
app_pc speculate_next_tag, bool record_translation); | |||
|
|||
#ifdef AARCH64 | |||
int | |||
fixup_indirect_trace_exit(dcontext_t *dcontext, instrlist_t *trace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below: long-term this should only be inside mangle_trace() and so this would be removed when #5062 is implemented. Please add a XXX i#5062
comment about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of 'here', maybe mention end_and_emit_trace
.
* the trace as a fake exit stub. Swapping them might be dangerous since | ||
* a stub trace may be created on both paths. | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run with -stress_recreate_state and verify there are no translation failures or warnings about unknown mangling sequences.
* the trace as a fake exit stub. Swapping them might be dangerous since | ||
* a stub trace may be created on both paths. | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make progress let's merge this but please first add a comment in the code and in the commit message about these separate trace_exit_label sequences breaking translation and how either special support needs to be added or (the better choice IMHO: best to avoid internal control flow) they need to be removed and a different scheme used such as the one proposed at https://github.com/DynamoRIO/dynamorio/pull/5045/files/b9ada873ecc0b86128169aae0bd7ad648c0f921b..fc70e80bad8a66b7f417c27673fa5659e96ff1ef#r697566139
linear control flow Currently fixup_indirect_trace_exit adds a special trace_exit_label that breaks the linear control flow which is assumed by many places in the code. We need to consider an alternative scheme to avoid this problem or add special support for the trace_exit_label. Also reverted changes on incorrect upper branch bound calculation. Issues: #1569, #2974
It seems that I need to add a lot in the final commit message 😄 , might as well include it here before I merge this into the master branch just in case I forgot something.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, and taking the time to go through what ended up being a long review process!
core/arch/aarch64/emit_utils.c
Outdated
@@ -370,14 +370,14 @@ patch_branch(dr_isa_mode_t isa_mode, cache_pc branch_pc, cache_pc target_pc, | |||
ASSERT(ALIGNED(branch_pc, 4) && ALIGNED(target_pc, 4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i#2974 trace support for AArch64, part 18: Adding comment on preferring
linear control flow
Note that the ", part N: " is a convention for merges to master: i.e., this PR being merged will be part 1; the next PR for trace building merged into master would be part 2 (not part 19 or sthg: the PR-internal ones don't count toward the master merges).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that's why I put part 2 in my final commit message above. Have a look if you want and see if there is anything missing or not quite right.
No problem at all, really appreciated your (and other reviewer's) speedy reply! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the discussion and addressing all comments!
@@ -2196,6 +2196,11 @@ int | |||
append_trace_speculate_last_ibl(dcontext_t *dcontext, instrlist_t *trace, | |||
app_pc speculate_next_tag, bool record_translation); | |||
|
|||
#ifdef AARCH64 | |||
int | |||
fixup_indirect_trace_exit(dcontext_t *dcontext, instrlist_t *trace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of 'here', maybe mention end_and_emit_trace
.
This patch incorporated changes from patch #2442, and fixed some corner cases
where the assumption was incorrect and caused the program to crash.
Trace support is not yet enabled by default, but can be enabled with "-enable_traces".
A large part of this work was based on work done by Kevin Zhou. Please see branch
i1569-trace and patch #2442 for the original version.
Issues: #1569, #2974