-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#5786: Add precise clean call mangling identification #5791
Conversation
Adds new labels delimiting clean call sequences. Converts into a translation record flag when storing translations. Uses the new labels and flag to precisely identify clean call mangling, replacing the previous scheme which incorrectly thought mangled tool pc-relative was a clean call, resulting in incorrect translations and crashes. Adds a test case to api.detach_state by adding a client (by converting it to use static DR) which inserts a pc-relative load. This reproduces the crash on detach, and is fixed with this fix. The added instrumentation caused periodic detach failures which were solved by setting the translation and adding a restore-state event: i#4232 covers trying to improve the situation. Issue: #5786, #4232 Fixes #5786
WIll add a ref to #4219 as well which covers xl8 problems within the clean call sequence, not solved here. |
Stops using instr_t.note to hold encoding offsets for pc-releative operands. Adds a new field instr_t.offset which is used for this purpose. This leaves note values in place across encodings, which is needed for new clean call marking labels and also simplifies rseq handling code. This is a compatibility break and we should bump the version. This should be committed separately; adding to this PR to ensure it all works together.
…SION to 990; Replace note instances in core/emit.c; Replace in ir_aarch64 and disable for non-DR_FAST_IR; Update dr_get_note docs
detach_state is failing on the GA CI machine with an interesting ymmh state not being preserved issue which does not reproduce locally. Going to go ahead and ask for an initial review while I try repro/debug on other platforms or tmate. Probably the instr_t.offset would be separated into its own PR: but first let's agree that using labels for translation is the way to go. |
An update here: still trying to figure out the detach_state ymm high bit failure on GA CI. It does not reproduce in my work workstation; it does reproduce on an Ubuntu22 machine. I did track down the key details: the issue is that xstate_bv is set to 0x203 on the suspend signals when DR is statically linked into api.detach_state. That means the AVX state (the high ymm bits) are ignored by sigreturn and end up 0. This has nothing to do with any other part of this PR: solely with DR being statically linked in. It is weird: xstate_bv is 0x203 in the 1st SIGSEGV in the main thread for both dynamic and static DR; but for dynamic all subsequent signals have 0x207. The actual hardware is always 0x207 the entire time. So it seems like a lazy signal state feature in the kernel but only on some kernels and somehow static DR keeps it lazy: not sure how b/c dynamic doesn't really do anything different wrt AVX with the start/stop interface. |
Adding an AVX instr to the app main thread and sideline thread doesn't change
I still don't understand how it can be so locally lazy: but I solved by adding
|
…t to avoid lazy restore problems
x64 failure is api.thread_churn assert: have not seen that before. Does not repro locally. Likely a flake; re-running; will file after re-run. |
It's thread_churn again.
|
If I run -loglevel 1 to get more info it suddenly drops:
Hack-printing the heap category breakdowns shows it getting smaller
|
I looked closely at the heap usage and it seems that one run happens to push into a new unit and with the double-sizing it ends up with 10 more blocks due to that one larger-capacity unit. Setting
This is getting rather frustrating, all the weird failures popping up. |
Searching backward on my local machine: it is the instr_t field and heap bucket commit which triggers the thread_churn failure; yet on GA CI that commit passed and it showed up later, which is unexplained. |
The cache jump from 128 to 144 happens on the 5 to 6 thread transition: still 144 Actually it's on 6 threads! So if I start with 6 it passes. Grrr. |
thread_churn passes on the one local machine where the first problem reproduced; but it still fails on GA CI. Plus the Windows runs here seem to be hanging. This PR has been a nightmare -- and so far all the issues do not seem to be its fault but are other limitations or latent issues or inherent non-determinism that happen to be triggered by these changes. |
The Windows runs get cancelled after 6 hours (I think that's the GA time limit) -- yet the Run Suite step has no logs that can be viewed or downloaded so it is not at all clear what is happening. |
Looking at other PR's, their Windows runs complete as you'd expect after 15-20 mins (though they seem to have some new test failures that appeared a few days ago). Maybe something in this PR is causing hangs and some Github problem is preventing the logs from showing which tests are hanging. |
Adds new labels delimiting clean call sequences. Converts into a translation record flag when storing translations. Uses the new labels and flag to precisely identify clean call mangling, replacing the previous scheme which incorrectly thought mangled tool pc-relative was a clean call, resulting in incorrect translations and crashes. Adds a test case to api.detach_state by adding a client (by converting it to use static DR) which inserts a pc-relative load. This reproduces the crash on detach, and is fixed with this fix. The added instrumentation caused periodic detach failures which were solved by setting the translation and adding a restore-state event: i#4232 covers trying to improve the situation. Adds a new instr_t.offset field. Stops using instr_t.note to hold encoding offsets for pc-releative operands. Adds a new field instr_t.offset which is used for this purpose. This leaves note values in place across encodings, which is needed for new clean call marking labels and also simplifies rseq handling code. This instr_t field is a compatibility break and we bump the version and OLDEST_COMPATIBLE_VERSION here to 990. Updates dr_get_note docs. Augments logging of xl8 info with new flag info. Reduces DR_NOTE_FIRST_RESERVED to give DR more reserved labels. This is another compatibility break, while at it. Fixes several issues hit in tests that happened to trigger on the heap bucket size and other changes: + Fixes a rank order violation at loglevel 5: xref #1649 + Writes real xstate_bv into signal frame when setting the xstate context to avoid lazy AVX restore problems. + Tweaks the thread_churn test to work around non-linearities. Issue: #5786, #4232 Fixes #5786
Updates the documentation for various encode APIs to correctly state that the offset field of instr_t is used for instr_t target offset computation, instead of the note field which was used prior to #5791.
Updates the documentation for various encode APIs to correctly state that the offset field of instr_t is used for instr_t target offset computation, instead of the note field which was used prior to #5791.
Adds new labels delimiting clean call sequences.
Converts into a translation record flag when storing translations.
Uses the new labels and flag to precisely identify clean call mangling,
replacing the previous scheme which incorrectly thought mangled tool
pc-relative was a clean call, resulting in incorrect translations and crashes.
To enable using labels after emitting, stops using instr_t.note to hold encoding
offsets for pc-releative operands. Adds a new field instr_t.offset which is used for
this purpose. This leaves note values in place across encodings, which is needed
for new clean call marking labels and also simplifies rseq handling code.
This is a compatibility break and as such we increase the version and
OLDEST_COMPATIBLE_VERSION to 990.
Since we only reserved 16 labels for DR, and we're already breaking compatilibity,
we go ahead and reduce DR_NOTE_FIRST_RESERVED to give DR more reserved labels.
Adds a test case to api.detach_state by adding a client (by converting it to use static DR) which inserts a pc-relative load. This reproduces the crash on detach, and is fixed with this fix. The added instrumentation caused periodic detach failures which were solved by setting the translation and adding a restore-state event: i#4232 covers trying to improve the situation.
Changing the test to use static DR resulted in some change in some Linux kernel lazy
AVX state restores on sigreturn causing the AVX values set on detach to be ignored.
We solve this by writing the real xstate_bv into the signal frame when setting the
xstate context.
Issue: #5786, #4232
Fixes #5786