-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x64 common.floatpc_xl8all fails non-deterministically with a truncated pc #2267
Comments
I cannot reproduce on Linux in 10,000 iters. Does anyone have win8.1+ on real hardware? Can you reproduce this bug there? In a 64-bit build dir run (this is for cygwin bash):
And look for the string |
Adds the following tests to the ignore list in order to keep the long suite on master merges greener: + win64 'common.floatpc_xl8all' #2267 + win64 'code_api|tool.drcachesim.simple-config-file' #1807 + lin32 'code_api,opt_speed|common.fib' #1807 + lin32 'prof_pcs|common.nativeexec_exe_opt' #2052 + lin64 'code_api,opt_speed|common.floatpc_xl8all' #1807 Only ignored in the long suite since these are tests we don't want to completely break: + lin32 'code_api|api.startstop' #4604 + lin64 'code_api|api.detach_state' #5123 + lin64 'code_api|client.cleancallsig' #1807 Issue: #1807, #2267, #2052, #4604, #5123
Adds the following tests to the ignore list in order to keep the long suite on master merges greener: + win64 'common.floatpc_xl8all' #2267 + win64 'code_api|tool.drcachesim.simple-config-file' #1807 + lin32 'code_api,opt_speed|common.fib' #1807 + lin32 'prof_pcs|common.nativeexec_exe_opt' #2052 + lin64 'code_api,opt_speed|common.floatpc_xl8all' #1807 Only ignored in the long suite since these are tests we don't want to completely break: + lin32 'code_api|api.startstop' #4604 + lin64 'code_api|api.detach_state' #5123 + lin64 'code_api|client.cleancallsig' #1807 Issue: #1807, #2267, #2052, #4604, #5123
The floatpc_xl8all test has started failing on the x64 Linux Github Actions VMs: it maybe the same issue. https://github.com/DynamoRIO/dynamorio/actions/runs/6752187857/job/18358030169
|
Given that some other more-frequent-suddenly flaky failures are due to AMD vs Intel I looked into AMD behavior differences for the last floating PC and there are some: AMD manual vol 2:
Hmm but wouldn't both the intra and inter fxsave cases then fail? We're only seeing the inter cases fail? Xref the logs above where the speculation was invoked, which we did not expect, but maybe it wasn't due to truncation but due to the PC being whatever the last exception was and so not being a cache PC -- but it got the right bottom bits for the intra... |
Oh I see: for intra DR doesn't care what the hardware writes; it proactively writes the right PC in there |
So for AMD runs today, we see a value of 0 for the PC. Old AMD processors apparently left the prior value of the last exception PC, but that was as security issue: so maybe modern ones zero it out? So this seems a little different from the logs above in the initial entry. |
Adds CMake detection of the processor vendor. Relaxes the floatpc_xl8all test to expect failure when running on AMD for inter-block cases, where DR won't be able to translate as the processor does not supply the PC if there wasn't an exception. Issue: #2267
Adds CMake detection of the processor vendor. Relaxes the floatpc_xl8all test to expect failure when running on AMD for inter-block cases, where DR won't be able to translate as the processor does not supply the PC if there wasn't an exception. Issue: #2267
It sounds like it's the OS doing the zeroing for AMD processors. The AMD failures for this test are fixed: I relaxed the output to expect failure. For the original issue (maybe should have filed a separate one for AMD) with the truncation: I guess we leave it open. |
`vendor_id` is not always available, only do `REGEX REPLACE` it a match if found. Issue: DynamoRIO#2267
`vendor_id` is not always available, only do `REGEX REPLACE` if a match is found. Tested on Intel, AMD, and \<unknown\> AArch64 machines via GA runners. Issue: #2267
Splitting from #2145 as it does not look like this is easily solved.
I can repro the common.floatpc_xl8all failure seen on Appveyor on my local
win10 and win8.1 VM's every so often: a few times every 1000 iters.
2 instances
At -loglevel 2, 1 instance out of 1000:
Comparing success and failure log files relevant lines:
Success:
Failure:
It shouldn't be like #1427 b/c it's fxsave64. So why is the pc just
bottom 32 bits?
Separate run shows the exit reason is fxsave64:
Just a reminder of the instru here. We pass a pointer to the base of the
fxsave64 destination to float_pc_update:
So it's getting the actual pc written by fxsave64.
Could it be a vmware bug?
Never seen it on win7, but haven't run 1000x in a loop.
Why nondet?
Why limited to Windows (or is it?)?
The text was updated successfully, but these errors were encountered: