Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#6919: Fix XMM saving after synchall #6920

Merged

Conversation

ndrewh
Copy link
Contributor

@ndrewh ndrewh commented Aug 16, 2024

This adds a test and a fix for #6919.

The test (modeled after the linux.sigcontext test) uses dr_flush_region to trigger a synchall, which causes XMM regs to be clobbered. The fix in core/unix/signal.c copies fp state from the mcontext.

This fixes saving along the following path:

dispatch_enter_fcache => check_wait_at_safe_spot
=> thread_set_self_mcontext => thread_set_self_context

Fixes: #6919

ndrewh added 9 commits August 16, 2024 13:55
thread_set_self_mcontext should fill in fpstate from mcontext
and thread_set_self_context should not call save_fpstate, which
would overwrite saved state with clobbered state.

This fixes saving along the following path:

dispatch_enter_fcache => check_wait_at_safe_spot
=> thread_set_self_mcontext => thread_set_self_context

The test included in this commit uses dr_flush_region to trigger
a synch-all event, triggering the above path and clobbering fp
regs.
@ndrewh ndrewh marked this pull request as ready for review August 17, 2024 15:17
@ndrewh
Copy link
Contributor Author

ndrewh commented Aug 17, 2024

The remaining failing tests do not appear to be my fault...

@ndrewh ndrewh changed the title i#6919 fix XMM saving after synch-all i#6919 fix XMM saving after synchall Aug 19, 2024
suite/tests/linux/threadset.c Outdated Show resolved Hide resolved
core/unix/signal.c Show resolved Hide resolved
suite/tests/linux/threadset.c Outdated Show resolved Hide resolved
suite/tests/linux/threadset.c Outdated Show resolved Hide resolved
suite/tests/linux/threadset.c Outdated Show resolved Hide resolved
suite/tests/linux/threadset.dll.c Outdated Show resolved Hide resolved
suite/tests/linux/threadset.c Outdated Show resolved Hide resolved
suite/tests/linux/threadset.c Outdated Show resolved Hide resolved
suite/tests/linux/threadset.c Outdated Show resolved Hide resolved
@edeiana
Copy link
Contributor

edeiana commented Aug 19, 2024

The remaining failing tests do not appear to be my fault...

I re-run your only failing test and it passed.
Unfortunately some tests are flaky.

@ndrewh
Copy link
Contributor Author

ndrewh commented Aug 19, 2024

If printf/assert is preferred, I am happy to change it and the test will still pass (as long as the fix is applied)

The only reason I avoided these functions is that my test is so effective at triggering synchalls that it will reliably crash in these libc functions if xmm registers are not preserved (i.e. before the signal.c patch is applied). I personally preferred that the test failed gracefully, but a segfault is also clearly a failure.

@edeiana edeiana requested a review from abhinav92003 August 20, 2024 19:14
Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay: confusion over 2nd reviewer. Mostly I have style nits plus question on putting back the libc calls; after that should be good to go! Thank you for bearing with the review process.

core/unix/signal.c Outdated Show resolved Hide resolved
core/unix/signal.c Outdated Show resolved Hide resolved
suite/tests/linux/threadset.dll.c Outdated Show resolved Hide resolved
suite/tests/linux/threadset.dll.c Outdated Show resolved Hide resolved
suite/tests/linux/threadset.dll.c Outdated Show resolved Hide resolved
suite/tests/linux/threadset.c Outdated Show resolved Hide resolved
suite/tests/linux/threadset.c Outdated Show resolved Hide resolved
suite/tests/CMakeLists.txt Outdated Show resolved Hide resolved
core/unix/signal.c Outdated Show resolved Hide resolved
core/unix/signal.c Show resolved Hide resolved
@ndrewh ndrewh changed the title i#6919 fix XMM saving after synchall i#6919: Fix XMM saving after synchall Aug 22, 2024
@ndrewh ndrewh requested a review from derekbruening August 22, 2024 17:26
Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you for contributing!

@derekbruening derekbruening requested a review from edeiana August 23, 2024 00:28
Copy link
Contributor

@edeiana edeiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Thank you @ndrewh !

core/unix/signal.c Outdated Show resolved Hide resolved
@ndrewh
Copy link
Contributor Author

ndrewh commented Aug 23, 2024

Sorry for the confusion. It turns out this can be fixed by just reordering the existing get_and_initialize_xstate_buffer and leaving everything else the same (I think this ended up being the same fix I proposed in the original email chain). The new fix is ready for review.

I definitely just got confused while writing the test (I originally thought this fix was insufficient because it was failing in release mode -- but that was just a skill issue on my part because I had only rebuilt the debug bins with the fix).

@derekbruening
Copy link
Contributor

Sorry for the confusion. It turns out this can be fixed by just reordering the existing get_and_initialize_xstate_buffer and leaving everything else the same (I think this ended up being the same fix I proposed in the original email chain). The new fix is ready for review.

I definitely just got confused while writing the test (I originally thought this fix was insufficient because it was failing in release mode -- but that was just a skill issue on my part because I had only rebuilt the debug bins with the fix).

No worries, to be fair I think us reviewers also had trouble fully understanding as well, maybe b/c it's hard to understand this code with just a quick glance.

core/unix/signal.c Show resolved Hide resolved
@derekbruening
Copy link
Contributor

The a64 drcachesim.TLB-threads online-analysis failure is a known flake "invalid exit before a bundle" #3320

@derekbruening derekbruening merged commit 6c2037e into DynamoRIO:master Aug 23, 2024
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

APP CRASH: XMM not properly restored by thread_set_self_mcontext
3 participants