Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api.rseq test fails on AArch64 #4923

Closed
AssadHashmi opened this issue May 26, 2021 · 2 comments · Fixed by #5350
Closed

api.rseq test fails on AArch64 #4923

AssadHashmi opened this issue May 26, 2021 · 2 comments · Fixed by #5350

Comments

@AssadHashmi
Copy link
Contributor

Migrating the AArch64 Jenkins CI to an Ampere Altera machine caused a regression in the api.rseq test with:

<Application /var/lib/jenkins/workspace/DynamoRIO-AArch64-Precommit/build/build_debug-internal-64/suite/tests/bin/api.rseq (247628). Failed to synchronize with all threads when detaching.>

But only the debug build. The release build passes.

@derekbruening
Copy link
Contributor

This reproduces in every run. It is a hang during detach. It looks like the detach thread keeps trying to suspend the other thread and fails to get it at a safe spot due to translation errors, so I think this is a translation bug. I see:

recreate_app : pc is in F4(0x0000aaaae93c4970)
ilist for recreation:
TAG  0x0000aaaae93c4970
 +0    m4 @0x0000fffda6be0eb8  f9000380   str    %x0 -> (%x28)[8byte]
 +4    m4 @0x0000fffda6be0f20  f9401f80   ldr    +0x38(%x28)[8byte] -> %x0
 +8    m4 @0x0000fffda6be0fe8  f902b801   str    %x1 -> +0x0570(%x0)[8byte]
 +12   m4 @0x0000fffda6be0c28  f9400380   ldr    (%x28)[8byte] -> %x0
 +16   L3 @0x0000fffda6be0ca8  d2800021   movz   $0x0001 lsl $0x00 -> %x1
 +20   L3 @0x0000fffda6be0d28  17fffffe   b      $0x0000aaaae93c4970
END 0x0000aaaae93c4970

unsupported mangle instr: str    %x1 -> +0x0570(%x0)[8byte]
recreate_app -- invalid state: unsup=1 in-mangle=1 xl8=0x0000aaaae93c4970 walk=0x0000aaaae93c4970
recreate_app -- not able to fully recreate context, pc is in added instruction from mangling
recreate_app: moving 0x0000aaaae93c4970 inside rseq region to handler 0x0000aaaae93c4988
recreate_app -- found ok pc 0x0000aaaae93c4988
        restoring stolen register to 0x0000ffffa73061a0
thread 3094900 not at safe spot (pc=0x0000aaaae93c4988) for 5
Synch failed!
Not all threads synched looping again

I think it moves the thread into its rseq failure handler, which causes the thread to re-try.
So we end up with an infinite loop.
There is a max count of tries to suspend but maybe the test timeout hits first.

@derekbruening
Copy link
Contributor

I think those meta instrs are from rseq mangling: so is this translation failure #4669?

derekbruening added a commit that referenced this issue Feb 11, 2022
Adds translation of the save of input registers to rseq sequences.

This fixes a hang on detach in the api.ir test on AArch64, which we
remove from the flaky list here.

Manually tested by running api.rseq 200x on the Jenkins machine.
Previously the test failed every single time there.

Issue: #4923, #4316, #4669
Fixes #4923
derekbruening added a commit that referenced this issue Feb 11, 2022
Adds translation of the save of input registers to rseq sequences.

This fixes a hang on detach in the api.ir test on AArch64, which we
remove from the flaky list here.

Manually tested by running api.rseq 200x on the Jenkins machine.
Previously the test failed every single time there.

Issue: #4923, #4316, #4669
Fixes #4923
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants