i#5786: Add precise clean call mangling identification #5791

derekbruening · 2022-12-15T19:35:58Z

Adds new labels delimiting clean call sequences.
Converts into a translation record flag when storing translations.

Uses the new labels and flag to precisely identify clean call mangling,
replacing the previous scheme which incorrectly thought mangled tool
pc-relative was a clean call, resulting in incorrect translations and crashes.

To enable using labels after emitting, stops using instr_t.note to hold encoding
offsets for pc-releative operands. Adds a new field instr_t.offset which is used for
this purpose. This leaves note values in place across encodings, which is needed
for new clean call marking labels and also simplifies rseq handling code.

This is a compatibility break and as such we increase the version and
OLDEST_COMPATIBLE_VERSION to 990.

Since we only reserved 16 labels for DR, and we're already breaking compatilibity,
we go ahead and reduce DR_NOTE_FIRST_RESERVED to give DR more reserved labels.

Adds a test case to api.detach_state by adding a client (by converting it to use static DR) which inserts a pc-relative load. This reproduces the crash on detach, and is fixed with this fix. The added instrumentation caused periodic detach failures which were solved by setting the translation and adding a restore-state event: i#4232 covers trying to improve the situation.

Changing the test to use static DR resulted in some change in some Linux kernel lazy
AVX state restores on sigreturn causing the AVX values set on detach to be ignored.
We solve this by writing the real xstate_bv into the signal frame when setting the
xstate context.

Issue: #5786, #4232
Fixes #5786

Adds new labels delimiting clean call sequences. Converts into a translation record flag when storing translations. Uses the new labels and flag to precisely identify clean call mangling, replacing the previous scheme which incorrectly thought mangled tool pc-relative was a clean call, resulting in incorrect translations and crashes. Adds a test case to api.detach_state by adding a client (by converting it to use static DR) which inserts a pc-relative load. This reproduces the crash on detach, and is fixed with this fix. The added instrumentation caused periodic detach failures which were solved by setting the translation and adding a restore-state event: i#4232 covers trying to improve the situation. Issue: #5786, #4232 Fixes #5786

derekbruening · 2022-12-15T19:36:31Z

WIll add a ref to #4219 as well which covers xl8 problems within the clean call sequence, not solved here.

Stops using instr_t.note to hold encoding offsets for pc-releative operands. Adds a new field instr_t.offset which is used for this purpose. This leaves note values in place across encodings, which is needed for new clean call marking labels and also simplifies rseq handling code. This is a compatibility break and we should bump the version. This should be committed separately; adding to this PR to ensure it all works together.

…SION to 990; Replace note instances in core/emit.c; Replace in ir_aarch64 and disable for non-DR_FAST_IR; Update dr_get_note docs

…ean-call-xl8

core/lib/globals_api.h

derekbruening · 2022-12-16T20:14:34Z

detach_state is failing on the GA CI machine with an interesting ymmh state not being preserved issue which does not reproduce locally. Going to go ahead and ask for an initial review while I try repro/debug on other platforms or tmate. Probably the instr_t.offset would be separated into its own PR: but first let's agree that using labels for translation is the way to go.

CMakeLists.txt

core/arch/mangle_shared.c

core/ir/instr.h

core/heap.c

core/lib/globals_api.h

suite/tests/api/detach_state.c

CMakeLists.txt

derekbruening · 2022-12-20T03:27:08Z

An update here: still trying to figure out the detach_state ymm high bit failure on GA CI. It does not reproduce in my work workstation; it does reproduce on an Ubuntu22 machine.

I did track down the key details: the issue is that xstate_bv is set to 0x203 on the suspend signals when DR is statically linked into api.detach_state. That means the AVX state (the high ymm bits) are ignored by sigreturn and end up 0. This has nothing to do with any other part of this PR: solely with DR being statically linked in.

It is weird: xstate_bv is 0x203 in the 1st SIGSEGV in the main thread for both dynamic and static DR; but for dynamic all subsequent signals have 0x207. The actual hardware is always 0x207 the entire time. So it seems like a lazy signal state feature in the kernel but only on some kernels and somehow static DR keeps it lazy: not sure how b/c dynamic doesn't really do anything different wrt AVX with the start/stop interface.

derekbruening · 2022-12-20T03:52:57Z

Adding an AVX instr to the app main thread and sideline thread doesn't change
anything (these were already there in the sideline thread anyway to store the sentinels):

    __asm("vmovdqu %%ymm0, %%ymm0" : : : "ymm0");//NOCHECK

I still don't understand how it can be so locally lazy: but I solved by adding
what save_xmm() does, setting the signal's xstate_bv to the real value inside
mcontext_to_sigcontext_simd():

mcontext_to_sigcontext_simd: setting xstate_bv from 0x0000000000000203 to 0x0000000000000207

…t to avoid lazy restore problems

…ean-call-xl8

derekbruening · 2022-12-20T05:27:17Z

x64 failure is api.thread_churn assert: have not seen that before. Does not repro locally. Likely a flake; re-running; will file after re-run.

derekbruening · 2022-12-20T16:33:45Z

It's thread_churn again.
I found a machine where it reproduces -- the 2nd run does have 10 more unreachable heap blocks.
But, this test passed on the instr_t field and heap bucket size change: it is failing after the final changes none of which seem like they would affect heap usage. Adding a 3rd run shows zero growth so I do not think this is a real problem:

peak unreach heap: 248 A => 258 B => 258 C

derekbruening · 2022-12-20T17:35:57Z

If I run -loglevel 1 to get more info it suddenly drops:

peak unreach heap: 248 A => 234 B => 234 C

Hack-printing the heap category breakdowns shows it getting smaller

Total max (not nec. all used simult.):  11018 KB
Total max (not nec. all used simult.):  10980 KB
Total max (not nec. all used simult.):  10980 KB
peak unreach heap: 248 A => 258 B => 258 C

api/docs/release.dox

core/lib/globals_api.h

core/unix/signal_linux_x86.c

derekbruening · 2022-12-20T23:57:18Z

I looked closely at the heap usage and it seems that one run happens to push into a new unit and with the double-sizing it ends up with 10 more blocks due to that one larger-capacity unit. Setting -initial_global_heap_unit_size 256K makes that go away: but then we hit this:

peak reach cache: 128 A => 144 B

This is getting rather frustrating, all the weird failures popping up.

derekbruening · 2022-12-21T00:02:36Z

Searching backward on my local machine: it is the instr_t field and heap bucket commit which triggers the thread_churn failure; yet on GA CI that commit passed and it showed up later, which is unexplained.

derekbruening · 2022-12-21T01:24:43Z

The cache jump from 128 to 144 happens on the 5 to 6 thread transition: still 144 Actually it's on 6 threads! So if I start with 6 it passes. Grrr.

…X signal bit

derekbruening · 2022-12-21T05:58:00Z

thread_churn passes on the one local machine where the first problem reproduced; but it still fails on GA CI.

Plus the Windows runs here seem to be hanging. This PR has been a nightmare -- and so far all the issues do not seem to be its fault but are other limitations or latent issues or inherent non-determinism that happen to be triggered by these changes.

derekbruening · 2022-12-21T15:54:38Z

The Windows runs get cancelled after 6 hours (I think that's the GA time limit) -- yet the Run Suite step has no logs that can be viewed or downloaded so it is not at all clear what is happening.

derekbruening · 2022-12-21T17:26:00Z

Looking at other PR's, their Windows runs complete as you'd expect after 15-20 mins (though they seem to have some new test failures that appeared a few days ago). Maybe something in this PR is causing hangs and some Github problem is preventing the logs from showing which tests are hanging.

…ean-call-xl8

Adds new labels delimiting clean call sequences. Converts into a translation record flag when storing translations. Uses the new labels and flag to precisely identify clean call mangling, replacing the previous scheme which incorrectly thought mangled tool pc-relative was a clean call, resulting in incorrect translations and crashes. Adds a test case to api.detach_state by adding a client (by converting it to use static DR) which inserts a pc-relative load. This reproduces the crash on detach, and is fixed with this fix. The added instrumentation caused periodic detach failures which were solved by setting the translation and adding a restore-state event: i#4232 covers trying to improve the situation. Adds a new instr_t.offset field. Stops using instr_t.note to hold encoding offsets for pc-releative operands. Adds a new field instr_t.offset which is used for this purpose. This leaves note values in place across encodings, which is needed for new clean call marking labels and also simplifies rseq handling code. This instr_t field is a compatibility break and we bump the version and OLDEST_COMPATIBLE_VERSION here to 990. Updates dr_get_note docs. Augments logging of xl8 info with new flag info. Reduces DR_NOTE_FIRST_RESERVED to give DR more reserved labels. This is another compatibility break, while at it. Fixes several issues hit in tests that happened to trigger on the heap bucket size and other changes: + Fixes a rank order violation at loglevel 5: xref #1649 + Writes real xstate_bv into signal frame when setting the xstate context to avoid lazy AVX restore problems. + Tweaks the thread_churn test to work around non-linearities. Issue: #5786, #4232 Fixes #5786

Updates the documentation for various encode APIs to correctly state that the offset field of instr_t is used for instr_t target offset computation, instead of the note field which was used prior to #5791.

derekbruening added 5 commits December 15, 2022 16:57

Fix rank order violation at loglevel 5: xref #1649

97bc4dd

Fix arm build errors; augment logging of xl8 info with new flag info

63cfff8

Additions to s/note/offset/: + Bump version and OLDEST_COMPATIBLE_VER…

bc7da1d

…SION to 990; Replace note instances in core/emit.c; Replace in ir_aarch64 and disable for non-DR_FAST_IR; Update dr_get_note docs

Merge branch 'master' of github.com:DynamoRIO/dynamorio into i5786-cl…

eb4eb02

…ean-call-xl8

derekbruening commented Dec 16, 2022

View reviewed changes

core/lib/globals_api.h Outdated Show resolved Hide resolved

derekbruening requested a review from abhinav92003 December 16, 2022 20:14

abhinav92003 approved these changes Dec 16, 2022

View reviewed changes

derekbruening added 5 commits December 19, 2022 22:56

Reduce DR_NOTE_FIRST_RESERVED to give DR more reserved labels

1492cb8

Write real xstate_bv into signal frame when setting the xstate contex…

4b9a2c3

…t to avoid lazy restore problems

Rename NOTE_CALLOUT to NOTE_CALL

4de594b

Add an assert on being unreachable

9636f05

Merge branch 'master' of github.com:DynamoRIO/dynamorio into i5786-cl…

fb83744

…ean-call-xl8

derekbruening requested a review from abhinav92003 December 20, 2022 05:27

abhinav92003 approved these changes Dec 20, 2022

View reviewed changes

api/docs/release.dox Outdated Show resolved Hide resolved

core/lib/globals_api.h Outdated Show resolved Hide resolved

core/unix/signal_linux_x86.c Show resolved Hide resolved

Tweak thread_churn test to work around non-linearities

3b2b9b4

Changed the existing reserved DR notes; added more comments on the AV…

c78581a

…X signal bit

derekbruening added 5 commits January 13, 2023 16:56

Add optional diagnostics to thread_churn test

9b73871

Merge branch 'master' of github.com:DynamoRIO/dynamorio into i5786-cl…

9fe65af

…ean-call-xl8

Update Windows intercept code to set the new offset field when encoding

45c518b

Print all mismtaches instead of asserting on first one with no values

c12a0b2

Run thread_churn test 3 times which seems to resolve flakiness

720ca53

derekbruening merged commit 8c0be09 into master Jan 14, 2023

derekbruening deleted the i5786-clean-call-xl8 branch January 14, 2023 22:07

derekbruening mentioned this pull request Apr 3, 2023

Move label notes to another field to survive encoding with jump targets #3835

Closed

abhinav92003 mentioned this pull request Apr 21, 2023

Update offset field name in encode API doc #5993

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i#5786: Add precise clean call mangling identification #5791

i#5786: Add precise clean call mangling identification #5791

derekbruening commented Dec 15, 2022 •

edited

Loading

derekbruening commented Dec 15, 2022

derekbruening commented Dec 16, 2022

derekbruening commented Dec 20, 2022

derekbruening commented Dec 20, 2022

derekbruening commented Dec 20, 2022

derekbruening commented Dec 20, 2022

derekbruening commented Dec 20, 2022

derekbruening commented Dec 20, 2022

derekbruening commented Dec 21, 2022

derekbruening commented Dec 21, 2022

derekbruening commented Dec 21, 2022

derekbruening commented Dec 21, 2022

derekbruening commented Dec 21, 2022

i#5786: Add precise clean call mangling identification #5791

i#5786: Add precise clean call mangling identification #5791

Conversation

derekbruening commented Dec 15, 2022 • edited Loading

derekbruening commented Dec 15, 2022

derekbruening commented Dec 16, 2022

derekbruening commented Dec 20, 2022

derekbruening commented Dec 20, 2022

derekbruening commented Dec 20, 2022

derekbruening commented Dec 20, 2022

derekbruening commented Dec 20, 2022

derekbruening commented Dec 20, 2022

derekbruening commented Dec 21, 2022

derekbruening commented Dec 21, 2022

derekbruening commented Dec 21, 2022

derekbruening commented Dec 21, 2022

derekbruening commented Dec 21, 2022

derekbruening commented Dec 15, 2022 •

edited

Loading