Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x64 common.floatpc_xl8all fails non-deterministically with a truncated pc #2267

Open
derekbruening opened this issue Mar 6, 2017 · 6 comments

Comments

@derekbruening
Copy link
Contributor

Splitting from #2145 as it does not look like this is easily solved.

I can repro the common.floatpc_xl8all failure seen on Appveyor on my local
win10 and win8.1 VM's every so often: a few times every 1000 iters.

% for ((i=0; i<1000; i++)); do echo $i; bin64/drrun -translate_fpu_pc -- suite/tests/bin/common.floatpc.exe; done 2>&1 | tee OUT

2 instances

At -loglevel 2, 1 instance out of 1000:

688
<log dir=C:\derek\dr\git\build_x64_dbg_tests\logs\common.floatpc.exe.6776.00000001>
<Starting application C:\derek\dr\git\build_x64_dbg_tests\suite\tests\bin\common.floatpc.exe (6776)>
<Initial options = -no_dynamic_options -loglevel 2 -code_api -probe_api -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -translate_fpu_pc -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct -pad_jmps_mark_no_trace >
FXSAVE64 intra is correctly handled
FXSAVE64 inter is **incorrectly** handled
FXSAVE intra is correctly handled
FXSAVE inter is correctly handled
<Stopping application C:\derek\dr\git\build_x64_dbg_tests\suite\tests\bin\common.floatpc.exe (6776)>

Comparing success and failure log files relevant lines:

Success:

Entry into F3218(0x00007ff6e956118b).0x00007ff6697d15c8 (shared)
Exit from F3218(0x00007ff6e956118b).0x00007ff6697d15e8 (shared) [common.floatpc.exe]

Entry into F3219(0x00007ff6e95611a4).0x00007ff6697d15f8 (shared)
Exit from F3219(0x00007ff6e95611a4).0x00007ff6697d163c (shared) [common.floatpc.exe]
 (target 0x00007ff6e95611b7 not in cache)
float_pc_update: fp state 0x000000719070fbb0
recreate_app_pc -- translating from pc=0x00007ff6697d15d8
recreate_app : pc is in F3218(0x00007ff6e956118b)
ilist for recreation:
TAG  0x00007ff6e956118b
 +0    L3                      48 8b c1             mov    %rcx -> %rax
 +3    m4 @0x00007ff669682288  48 ba 98 11 56 e9 f6 mov    $0x00007ff6e9561198 -> %rdx
                               7f 00 00
 +13   L3                      48 89 10             mov    %rdx -> (%rax)[8byte]
 +16   L3                      d9 ee                fldz   $0.000000 -> %st0
 +18   L3                      b8 01 00 00 00       mov    $0x00000001 -> %eax
 +23   L3                      83 f8 01             cmp    %eax $0x00000001
 +26   L4 @0x00007ff669686640  0f 85 a9 2e ee 7f    jnz    $0x00007ff6e95611bf
 +32   L4 @0x00007ff669683a88  e9 8f 2e ee 7f       jmp    $0x00007ff6e95611a4
END 0x00007ff6e956118b

recreate_app -- found valid state pc 0x00007ff6e9561198
recreate_app -- found ok pc 0x00007ff6e9561198
recreate_app_pc -- translation is 0x00007ff6e9561198
float_pc_update: translated 0x00007ff6697d15d8 to 0x00007ff6e9561198

dispatch: target = 0x00007ff6e95611b7

<...>

Entry into F3260(0x00007ff6e9561202).0x00007ff6697d1838 (shared)
Exit from F3260(0x00007ff6e9561202).0x00007ff6697d187b (shared) [common.floatpc.exe]
 (target 0x00007ff6e9561214 not in cache)
float_pc_update: fp state 0x000000719070fbb0
float_pc_update: speculating: pc 0x00000000697d1818 + top half of vmcode = 0x00007ff6697d1818
recreate_app_pc -- translating from pc=0x00007ff6697d1818
recreate_app : pc is in F3259(0x00007ff6e95611e9)
ilist for recreation:
TAG  0x00007ff6e95611e9
 +0    L3                      48 8b c1             mov    %rcx -> %rax
 +3    m4 @0x00007ff669683de8  48 ba f6 11 56 e9 f6 mov    $0x00007ff6e95611f6 -> %rdx
                               7f 00 00
 +13   L3                      48 89 10             mov    %rdx -> (%rax)[8byte]
 +16   L3                      d9 ee                fldz   $0.000000 -> %st0
 +18   L3                      b8 01 00 00 00       mov    $0x00000001 -> %eax
 +23   L3                      83 f8 01             cmp    %eax $0x00000001
 +26   L4 @0x00007ff669685a40  0f 85 05 2f ee 7f    jnz    $0x00007ff6e956121b
 +32   L4 @0x00007ff669685458  e9 ed 2e ee 7f       jmp    $0x00007ff6e9561202
END 0x00007ff6e95611e9

recreate_app -- found valid state pc 0x00007ff6e95611f6
recreate_app -- found ok pc 0x00007ff6e95611f6
recreate_app_pc -- translation is 0x00007ff6e95611f6
float_pc_update: translated 0x00007ff6697d1818 to 0x00007ff6e95611f6

dispatch: target = 0x00007ff6e9561214

Failure:

Entry into F3218(0x00007ff6e956118b).0x00007ff6697d15c8 (shared)
Exit from F3218(0x00007ff6e956118b).0x00007ff6697d15e8 (shared) [common.floatpc.exe]

Entry into F3219(0x00007ff6e95611a4).0x00007ff6697d15f8 (shared)
Exit from F3219(0x00007ff6e95611a4).0x00007ff6697d163c (shared) [common.floatpc.exe]
 (target 0x00007ff6e95611b7 not in cache)
float_pc_update: fp state 0x0000003ea0b7f8d0
float_pc_update: pc 0x00000000697d15d8 is translated already

dispatch: target = 0x00007ff6e95611b7

<...>

Entry into F3259(0x00007ff6e9561202).0x00007ff6697d181c (shared)
Exit from F3259(0x00007ff6e9561202).0x00007ff6697d185f (shared) [common.floatpc.exe]
 (target 0x00007ff6e9561214 not in cache)
float_pc_update: fp state 0x0000003ea0b7f8d0
float_pc_update: speculating: pc 0x00000000697d17fc + top half of vmcode = 0x00007ff6697d17fc
recreate_app_pc -- translating from pc=0x00007ff6697d17fc
recreate_app : pc is in F3258(0x00007ff6e95611e9)
ilist for recreation:
TAG  0x00007ff6e95611e9
 +0    L3                      48 8b c1             mov    %rcx -> %rax
 +3    m4 @0x00007ff669683de8  48 ba f6 11 56 e9 f6 mov    $0x00007ff6e95611f6 -> %rdx
                               7f 00 00
 +13   L3                      48 89 10             mov    %rdx -> (%rax)[8byte]
 +16   L3                      d9 ee                fldz   $0.000000 -> %st0
 +18   L3                      b8 01 00 00 00       mov    $0x00000001 -> %eax
 +23   L3                      83 f8 01             cmp    %eax $0x00000001
 +26   L4 @0x00007ff669685a40  0f 85 05 2f ee 7f    jnz    $0x00007ff6e956121b
 +32   L4 @0x00007ff669685458  e9 ed 2e ee 7f       jmp    $0x00007ff6e9561202
END 0x00007ff6e95611e9

recreate_app -- found valid state pc 0x00007ff6e95611f6
recreate_app -- found ok pc 0x00007ff6e95611f6
recreate_app_pc -- translation is 0x00007ff6e95611f6
float_pc_update: translated 0x00007ff6697d17fc to 0x00007ff6e95611f6

dispatch: target = 0x00007ff6e9561214

It shouldn't be like #1427 b/c it's fxsave64. So why is the pc just
bottom 32 bits?

Separate run shows the exit reason is fxsave64:

FXSAVE64 intra is correctly handled
float_pc_update: 64-bit reason
FXSAVE64 inter is **incorrectly** handled

Just a reminder of the instru here. We pass a pointer to the base of the
fxsave64 destination to float_pc_update:

Fragment 3221, tag 0x00007ff6c7b2118b, flags 0x9000630, shared, size 37:
        [common.floatpc.exe]
  0x00007ff647d914e8  48 8b c1             mov    %rcx -> %rax
  0x00007ff647d914eb  48 ba 98 11 b2 c7 f6 mov    $0x00007ff6c7b21198 -> %rdx
                      7f 00 00
  0x00007ff647d914f5  48 89 10             mov    %rdx -> (%rax)[8byte]
  0x00007ff647d914f8  d9 ee                fldz   $0.000000 -> %st0
  0x00007ff647d914fa  b8 01 00 00 00       mov    $0x00000001 -> %eax
  0x00007ff647d914ff  83 f8 01             cmp    %eax $0x00000001
  0x00007ff647d91502  0f 85 0f 8b f5 ff    jnz    $0x00007ff647cea017
  0x00007ff647d91508  e9 0a 8b f5 ff       jmp    $0x00007ff647cea017

Fragment 3222, tag 0x00007ff6c7b211a4, flags 0x10006b0, shared, size 73, cannot be trace:
        [common.floatpc.exe]
  0x00007ff647d91518  48 8b d4             mov    %rsp -> %rdx
  0x00007ff647d9151b  48 81 ec 10 02 00 00 sub    $0x0000000000000210 %rsp -> %rsp
  0x00007ff647d91522  48 83 e4 f0          and    $0xfffffffffffffff0 %rsp -> %rsp
  0x00007ff647d91526  65 48 89 3c 25 38 16 mov    %rdi -> %gs:0x00001638[8byte]
                      00 00
  0x00007ff647d9152f  65 48 8b 3c 25 40 16 mov    %gs:0x00001640[8byte] -> %rdi
                      00 00
  0x00007ff647d91538  66 c7 87 62 01 00 00 data16 mov    $0x0003 -> 0x00000162(%rdi)[2byte]
                      03 00
  0x00007ff647d91541  48 8d 3c 24          lea    (%rsp) -> %rdi
  0x00007ff647d91545  65 48 89 3c 25 28 16 mov    %rdi -> %gs:0x00001628[8byte]
                      00 00
  0x00007ff647d9154e  65 48 8b 3c 25 38 16 mov    %gs:0x00001638[8byte] -> %rdi
                      00 00
  0x00007ff647d91557  48 0f ae 04 24       fxsave64  -> (%rsp)[512byte]
  0x00007ff647d9155c  e9 cd 8a f5 ff       jmp    $0x00007ff647cea02e

float_pc_update: fp state 0x0000006a4116f5a0
recreate_app_pc -- translating from pc=0x00007ff647d914f8

So it's getting the actual pc written by fxsave64.
Could it be a vmware bug?
Never seen it on win7, but haven't run 1000x in a loop.
Why nondet?
Why limited to Windows (or is it?)?

@derekbruening
Copy link
Contributor Author

I cannot reproduce on Linux in 10,000 iters.
I cannot reproduce on physical hardware (but with win7) in 10,000 iters.

Does anyone have win8.1+ on real hardware? Can you reproduce this bug there? In a 64-bit build dir run (this is for cygwin bash):

% for ((i=0; i<1000; i++)); do echo $i; bin64/drrun -translate_fpu_pc -- suite/tests/bin/common.floatpc.exe; done > OUT 2>&1

And look for the string incorrect in OUT.

derekbruening added a commit that referenced this issue Dec 20, 2021
Adds the following tests to the ignore list in order to keep the long
suite on master merges greener:

+ win64 'common.floatpc_xl8all' #2267
+ win64 'code_api|tool.drcachesim.simple-config-file' #1807
+ lin32 'code_api,opt_speed|common.fib' #1807
+ lin32 'prof_pcs|common.nativeexec_exe_opt' #2052
+ lin64 'code_api,opt_speed|common.floatpc_xl8all' #1807

Only ignored in the long suite since these are tests we don't want to
completely break:

+ lin32 'code_api|api.startstop' #4604
+ lin64 'code_api|api.detach_state' #5123
+ lin64 'code_api|client.cleancallsig' #1807

Issue: #1807, #2267, #2052, #4604, #5123
derekbruening added a commit that referenced this issue Dec 21, 2021
Adds the following tests to the ignore list in order to keep the long
suite on master merges greener:

+ win64 'common.floatpc_xl8all' #2267
+ win64 'code_api|tool.drcachesim.simple-config-file' #1807
+ lin32 'code_api,opt_speed|common.fib' #1807
+ lin32 'prof_pcs|common.nativeexec_exe_opt' #2052
+ lin64 'code_api,opt_speed|common.floatpc_xl8all' #1807

Only ignored in the long suite since these are tests we don't want to
completely break:

+ lin32 'code_api|api.startstop' #4604
+ lin64 'code_api|api.detach_state' #5123
+ lin64 'code_api|client.cleancallsig' #1807

Issue: #1807, #2267, #2052, #4604, #5123
@derekbruening
Copy link
Contributor Author

The floatpc_xl8all test has started failing on the x64 Linux Github Actions VMs: it maybe the same issue.

https://github.com/DynamoRIO/dynamorio/actions/runs/6752187857/job/18358030169

 35/473 Test  #34: code_api|common.floatpc_xl8all ...................................***Failed  Required regular expression not found. Regex=[^FXSAVE64 intra is correctly handled
FXSAVE64 inter is correctly handled
FXSAVE intra is correctly handled
FXSAVE inter is correctly handled
$
]  0.35 sec
FXSAVE64 intra is correctly handled
FXSAVE64 inter is **incorrectly** handled
FXSAVE intra is correctly handled
FXSAVE inter is **incorrectly** handled

@derekbruening
Copy link
Contributor Author

Given that some other more-frequent-suddenly flaky failures are due to AMD vs Intel I looked into AMD behavior differences for the last floating PC and there are some:

AMD manual vol 2:

The FXSAVE and FXRSTOR instructions save and restore the entire 128-bit
media, 64-bit media, and x87 state. These instructions usually execute
faster than FSAVE/FNSAVE and FRSTOR because they do not normally save and
restore the x87 exception pointers (last-instruction pointer, last
data-operand pointer, and last opcode). The only case in which they do
save the exception pointers is the relatively rare case in which the
exception-summary bit in the x87 status word (FSW.ES) is set to 1,
indicating that an unmasked exception has occurred.

Hmm but wouldn't both the intra and inter fxsave cases then fail? We're only seeing the inter cases fail?

Xref the logs above where the speculation was invoked, which we did not expect, but maybe it wasn't due to truncation but due to the PC being whatever the last exception was and so not being a cache PC -- but it got the right bottom bits for the intra...

@derekbruening
Copy link
Contributor Author

Oh I see: for intra DR doesn't care what the hardware writes; it proactively writes the right PC in there
Which maybe it should not do on AMD 🙂

@derekbruening
Copy link
Contributor Author

So for AMD runs today, we see a value of 0 for the PC. Old AMD processors apparently left the prior value of the last exception PC, but that was as security issue: so maybe modern ones zero it out? So this seems a little different from the logs above in the initial entry.

derekbruening added a commit that referenced this issue Nov 10, 2023
Adds CMake detection of the processor vendor.

Relaxes the floatpc_xl8all test to expect failure when running on AMD
for inter-block cases, where DR won't be able to translate as the
processor does not supply the PC if there wasn't an exception.

Issue: #2267
@derekbruening derekbruening self-assigned this Nov 10, 2023
derekbruening added a commit that referenced this issue Nov 10, 2023
Adds CMake detection of the processor vendor.

Relaxes the floatpc_xl8all test to expect failure when running on AMD
for inter-block cases, where DR won't be able to translate as the
processor does not supply the PC if there wasn't an exception.

Issue: #2267
@derekbruening
Copy link
Contributor Author

It sounds like it's the OS doing the zeroing for AMD processors.

The AMD failures for this test are fixed: I relaxed the output to expect failure. For the original issue (maybe should have filed a separate one for AMD) with the truncation: I guess we leave it open.

ksco added a commit to ksco/dynamorio that referenced this issue Nov 15, 2023
`vendor_id` is not always available, only do `REGEX REPLACE` it a match if found.

Issue: DynamoRIO#2267
derekbruening pushed a commit that referenced this issue Nov 16, 2023
`vendor_id` is not always available, only do `REGEX REPLACE` if a match
is found. Tested on Intel, AMD, and \<unknown\> AArch64 machines via GA
runners.

Issue: #2267
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant