Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simultaneous 32-bit and 64-bit library support for x64 DR controlling WOW64 app #49

Open
derekbruening opened this issue Nov 27, 2014 · 6 comments

Comments

@derekbruening
Copy link
Contributor

From [email protected] on February 24, 2009 14:14:20

Today 32-bit DR can control the 32-bit parts of a WOW64 app but to see all
of the code including the emulation layer we want 64-bit DR able to run the
whole mixed-mode app. Some of the capabilities here also apply to Linux
mixed-mod apps but those are much, much rarer.

here is my list of cases that will eventually be separately filed here:

  • PR 240257: support 32-bit clients on WOW64? how mix 32 and 64 bit code?
    like Pin, give stream to separate clients? or give 32-bit code to
    64-bit client?
  • PR 253431: [wow64] simultaneous 32-bit and 64-bit dll support in 64-bit DR
  • PR 314367: re-enable x64 DR controlling WOW64 process once it works
  • PR 272553: [x64] late injection must switch from kernel32 to ntdll for
    wow64 children
  • PR 271317: preserve cs changes from far ctis and iret
  • PR 283895: [x64][correctness][performance] for x86 code use separate
    x86 ibl tables and compacted or separate tls
  • PR 283152: support high bit preservation across mode changes
  • PR 284029: [x64] support syscalls in x86 mode
    TODO: reg_spill_dcontext_offs(reg_id_t reg):
    /* Use REG_E?? instead of REG_X?? to eventually support 32-bit code
    spills in
    • mixed 64-bit/32-bit execution. */
  • PR 269595: WOW64 context translation failing when at our own
    post-syscall point
  • PR 254193: [x64] inject into different-architecture child: x64 to
    WOW64, WOW64 to x64
    => long-term we'll only support 64-bit-DR in WOW64 following (PR 253431)
  • PR 253943: [x64] support sysenter
  • PR 255555: [x64] 32-bit drinject options for launching 64-bit exe
    how know if ilist is 32-bit: instr_get_x86_mode() on each instr, or
    can assume if 1st then whole is same? shouldn't matter for most ops:
    IR is rich enough and cross-platform enough

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=49

@derekbruening
Copy link
Contributor Author

From [email protected] on April 24, 2012 08:16:37

also:

  • PR 282576: [x64] generate both 32-bit and 64-bit gencode routines
  • PR 279094: [x64] support simultaneous 32-bit and 64-bit exit stubs (fine,
    separate, coarse) and prefixes
  • PR 284029: [x64] support syscalls in x86 mode
  • PR 283895: [x64][correctness][performance] for x86 code use separate
    x86 ibl tables and compacted or separate tls
  • PR 271317: far cti must exit code cache to update dcontext_t.x86_mode
  • PR 215393 which put in simultaneous PE32 and PE32+ parsing, though should
    double-check Windows as certainly the Linux code today does not support
    mixing the two module types

Summary: simultaneous 32-bit and 64-bit library support for x64 DR controlling WOW64 app
Labels: -Priority-Medium Priority-Low OpSys-x64

@derekbruening
Copy link
Contributor Author

From [email protected] on June 21, 2012 12:38:25

I started filing these separately:

@derekbruening
Copy link
Contributor Author

From [email protected] on July 18, 2012 08:03:03

After Yang got mixed_mode.c working under -x86_to_x64, I suggested he try Cygwin ls.exe next, since Cygwin is likely to exercise more x86 corner cases and ls.exe is a well-behaved command line app.

However, we currently crash using our x64 mixed mode support when Cygwin modifies itself.

syslog bits:

<Application C:\cygwin\bin\ls.exe (8000). Unrecoverable Error at PC 0x00000000152c6977. Program aborted.

Relevant log parts:

interp (x86 mode): start_pc = 0x0000000061001654
0x0000000061001654 5a pop %esp (%esp) -> %edx %esp
0x0000000061001655 29 d0 sub %edx %eax -> %eax
0x0000000061001657 83 c0 07 add $0x00000007 %eax -> %eax
0x000000006100165a 83 ea 0c sub $0x0000000c %edx -> %edx
0x000000006100165d 89 42 01 mov %eax -> 0x01(%edx)
0x0000000061001660 ff e2 jmp %edx
end_pc = 0x0000000061001662

exit_branch_type=0x12 bb->exit_target=0x000000001c3c3bc0
emit_fragment: bb use ibl <0x000000001c3c3bc0>
exit_branch_type=0x12 target=0x000000001c3c3bc0 l->flags=0x9012
Fragment 15670, tag 0x0000000061001654, flags 0x1400630, 32-bit, shared, size 26:
[cygwin1.dll~__assert+0xf4,~getprogname-0xb3c]
Entry into F15670(0x0000000061001654).0x000000001c84a580 (32-bit)(shared)

ASYNCH intercepted exception in thread 7288 at pc 0x000000001c84a589
exception code = 0x00000000c0000005, ExceptionFlags=0x0000000000000000
record=0x0000000000000000, params=2
PC 0x000000001c84a589 tried to write address 0x000000006117d611
...
check_for_modified_code: exception was write to 0x000000006117d611
check_for_modified_code: seg fault in exec-writable region @0x000000006117d611
found_modified_code: translating 0x000000001c84a589
...
SYSLOG_WARNING: writing to executable region.
WARNING: Exec 0x000000006117d000-0x000000006117f000 WE written @0x000000006117d611 by 0x000000001c84a589 == app 0x000000006100165d
instr not in region, flushing entire 0x000000006117d000-0x000000006117f000
FLUSH STAGE 1: synch_unlink_priv(thread 7288 flushtime 4): 0x000000006117d000-0x000000006117f000
make_writable: pc 0x000000001c3b2000-0x000000001c3b5000, currently r-x- committed
make_unwritable: pc 0x000000001c3b2000-0x000000001c3b5000, currently rwx- committed
ASYNCH intercepted exception in thread 7288 at pc 0x00000000152c6977

I haven't fully investigated, I'm just filing so we have a record of it. For now he's going to try translating VS tools like cl.exe and masm.

@derekbruening
Copy link
Contributor Author

From [email protected] on July 23, 2012 15:00:30

Failed to run security-common.selfmod.exe in mixed mode.

In emit_utils.c, IF_X64(unlink_shared_syscall_common(SHARED_GENCODE(true))), pc = code->unlinked_shared_syscall + code->sys_unlink_offs, which is null. Then we try to dereference pc, which causes access violation.

@derekbruening
Copy link
Contributor Author

derekbruening commented Nov 27, 2014

From [email protected] on July 23, 2012 15:06:24

call stack for the previous failure:

 # Child-SP          RetAddr           Call Site
00 00000000`228bdf50 00000000`152d06d9 dynamorio!unlink_shared_syscall_common+0x9a [c:\src\dynamorio\trunk\core\x86\emit_utils.c @ 6998]
01 00000000`228bdfb0 00000000`150c5fbf dynamorio!unlink_shared_syscall+0x79 [c:\src\dynamorio\trunk\core\x86\emit_utils.c @ 7016]
02 00000000`228bdff0 00000000`150ca527 dynamorio!flush_fragments_synch_unlink_priv+0x90f [c:\src\dynamorio\trunk\core\fragment.c @ 6427]
03 00000000`228be1a0 00000000`15235a2c dynamorio!flush_fragments_in_region_start+0x287 [c:\src\dynamorio\trunk\core\fragment.c @ 6892]
04 00000000`228be250 00000000`1523500b dynamorio!flush_and_remove_executable_vm_area+0x3c [c:\src\dynamorio\trunk\core\vmareas.c @ 6078]
05 00000000`228be2c0 00000000`1538c212 dynamorio!app_memory_protection_change+0x1ebb [c:\src\dynamorio\trunk\core\vmareas.c @ 6533]
06 00000000`228be570 00000000`15383208 dynamorio!presys_ProtectVirtualMemory+0x4f2 [c:\src\dynamorio\trunk\core\win32\syscall.c @ 1925]
07 00000000`228be6f0 00000000`15158da4 dynamorio!pre_system_call+0x22b8 [c:\src\dynamorio\trunk\core\win32\syscall.c @ 2395]
08 00000000`228beb70 00000000`1514c951 dynamorio!handle_system_call+0x734 [c:\src\dynamorio\trunk\core\dispatch.c @ 1770]
09 00000000`228becf0 00000000`151482f9 dynamorio!dispatch_enter_dynamorio+0x10b1 [c:\src\dynamorio\trunk\core\dispatch.c @ 748]
0a 00000000`228bee70 00000000`2281313a dynamorio!dispatch+0x19 [c:\src\dynamorio\trunk\core\dispatch.c @ 143]
0b 00000000`228befe0 00000000`22843700 0x2281313a
0c 00000000`228befe8 abababab`abababab 0x22843700
0d 00000000`228beff0 00000000`00000001 0xabababab`abababab
0e 00000000`228beff8 00000000`00000000 0x1

derekbruening added a commit that referenced this issue Jan 4, 2021
Adds a long-missing feature: following into a Windows child process of
a different bitwidth.

Switches injection from DR and from drinjectlib (including drrun and
drinject) to use -early_inject_map.  This was most easily done by
turning on -early_inject by default as well.  However, the
-early_inject_location default is INJECT_LOCATION_ImageEntry, which is
the same late takeover point as with thread injection.  Switching all
injection over to map-from-the-parent simplifies cross-arch following,
as well as making it easier to shift the takeover point to an earlier
spot in the future.  This is a step toward #607 by switching
drinjectlib to use map injection; the takeover point, as mentioned, is
still the image entry.

Adds an -inject_x64 option to inject a 64-bit DR lib into a 32-bit
child from a 64-bit parent, but this option is only sketched out and
is not fully supported yet: #49 covers adding tests and official
support.

Adds library swapping code to find the other-bitwidth library, which
assumes a parallel directory structure.  Add a new fatal error if the
library for a child is not found.

To support generating code for all 3 child-parent cases (same-same,
32-64, and 64-32), and in particular for 32-64, switches the small
gencode sequence for -early_inject_map from using IR to using raw
bytes.  A multi-arch encoder (#1684) would help but we would need
cross-bitwidth support there, which is not on the horizon.  Fixes what
look like bugs in the original gencode generation along the way
(s/pc/cur_local_pos/ and s/local_code_buf/remote_code_buf/): it's not
clear how it worked before.

Adds support for several system calls from a 32-bit parent to a 64-bit
child where the desired NtWow64* system call does not exist.  We use
switch_modes_and_call() for NtProtectVirtualMemory and
NtQueryVirtualMemory.

Changes all types in the injection code to handle 64-bit addresses in
32-bit code.  Adds UNICODE_STRING_32 and
RTL_USER_PROCESS_PARAMETERS_32 for handling 32-bit structures from
64-bit parents.  Similarly, adds RTL_USER_PROCESS_PARAMETERS_64 and
PROCESS_BASIC_INFORMATION64.

Adds get_process_imgname_cmdline() capability for 64-bit remote from 32-bit.

Adds get_remote_proc_address() and uses it to look up
dynamorio_earliest_init_takeover() in a child DR.

Finds the remote ntdll base via a remote query memory walk plus remote
image header parsing.  This requires adding a switch_modes_and_call()
version of NtQueryVirtualMemory (also mentioned above), which needs
64-bit args: so we refactor switch_modes_and_call() to take in a
struct of all 64-bit fields for the args.

Fixes a few bugs in other routines to properly get the image name and
image entry for 32-bit children of 64-bit parents.

Updates environment variable propagation code to handle a 32-bit
parent and a 64-bit child.  Updates a 64-bit parent and 32-bit child
to insert the variables into the 32-bit PEB (64-bit does no good),
which requires finding the 32-bit PEB.  This is done via the 32-bit
TEB, using a hack due to what seems like a kernel bug where it has the
TebBaseAddress 0x2000 too low.

Makes environment variable propagation failures fatal and visible,
unlike previously where errors would just result in silently letting
the child run natively.  Turns some other prior soft errors into fatal
errors on child takeover.

Moves environment variable propagation to post-CreateUserProcess
instead of waiting for ResumeThread, which avoids having to get the
thread context (for which we have no other-bitwidth support) to figure
out whether it's the first thread in the process or not.  We bail on
propagation for pre-Vista where we'd have to wait for ResumeThred.

Generalizes the other-bitwidth Visual Studio toolchain environment
variable setting for use in a new build-and-test other-bitwidth test
which builds dynamorio and the large_options client (to ensure options
are propagated to children; and it has convenient init and exit time
prints) for the other bitwidth, arranges parallel lib dirs, and runs
the other client

Issue: #803, #147, #607, #49
Fixes #803
derekbruening added a commit that referenced this issue Jan 5, 2021
Adds a long-missing feature: following into a Windows child process of
a different bitwidth.

Switches injection from DR and from drinjectlib (including drrun and
drinject) to use -early_inject_map.  This was most easily done by turning
on -early_inject by default as well.  However, the -early_inject_location
default is INJECT_LOCATION_ThreadStart, a new "early" injection location
which is the same late takeover point as with thread injection (we could
also use _ImageEntry, which is only very slightly later, but that fails for
.NET and other applications).  Switching all injection over to
map-from-the-parent simplifies cross-arch following, as well as making it
easier to shift the takeover point to an earlier spot in the future.  This
is a step toward #607 by switching drinjectlib to use map injection; the
takeover point, as mentioned, is still the thread start.

Placing a hook at the thread start causes some stability issues, so instead
of the usual hook for -early_inject_map, for INJECT_LOCATION_ThreadStart we
set the thread context, like thread injection does.  The gencode still
restores the hook as a nop, for simplicity.  For parent64 child32, we can't
easily locate the thread start, so we assume it's
ntdll32!RtlUserThreadStart (which is also a fallback if anything fails in
other cases; the final fallback is a hook at the image entry, which works
nearly everywhere but not for .NET where the image entry is not reached).

Adds an -inject_x64 option to inject a 64-bit DR lib into a 32-bit
child from a 64-bit parent, but this option is only sketched out and
is not fully supported yet: #49 covers adding tests and official
support.

Adds library swapping code to find the other-bitwidth library, which
assumes a parallel directory structure.  Add a new fatal error if the
library for a child is not found.

To support generating code for all 3 child-parent cases (same-same,
32-64, and 64-32), and in particular for 32-64, switches the small
gencode sequence for -early_inject_map from using IR to using raw
bytes.  A multi-arch encoder (#1684) would help but we would need
cross-bitwidth support there, which is not on the horizon.  Fixes what
look like bugs in the original gencode generation along the way
(s/pc/cur_local_pos/ and s/local_code_buf/remote_code_buf/): it's not
clear how it worked before.

Adds support for several system calls from a 32-bit parent to a 64-bit
child where the desired NtWow64* system call does not exist.  We use
switch_modes_and_call() for NtProtectVirtualMemory and
NtQueryVirtualMemory.

Changes all types in the injection code to handle 64-bit addresses in
32-bit code.  Adds UNICODE_STRING_32 and
RTL_USER_PROCESS_PARAMETERS_32 for handling 32-bit structures from
64-bit parents.  Similarly, adds RTL_USER_PROCESS_PARAMETERS_64 and
PROCESS_BASIC_INFORMATION64.

Adds get_process_imgname_cmdline() capability for 64-bit remote from 32-bit.

Adds get_remote_proc_address() and uses it to look up
dynamorio_earliest_init_takeover() in a child DR.

Finds the remote ntdll base via a remote query memory walk plus remote
image header parsing.  This requires adding a switch_modes_and_call()
version of NtQueryVirtualMemory (also mentioned above), which needs
64-bit args: so we refactor switch_modes_and_call() to take in a
struct of all 64-bit fields for the args.

Fixes a few bugs in other routines to properly get the image name and
image entry for 32-bit children of 64-bit parents.

Updates environment variable propagation code to handle a 32-bit
parent and a 64-bit child.  Updates a 64-bit parent and 32-bit child
to insert the variables into the 32-bit PEB (64-bit does no good),
which requires finding the 32-bit PEB.  This is done via the 32-bit
TEB, using a hack due to what seems like a kernel bug where it has the
TebBaseAddress 0x2000 too low.

Makes environment variable propagation failures fatal and visible,
unlike previously where errors would just result in silently letting
the child run natively.  Turns some other prior soft errors into fatal
errors on child takeover.

Moves environment variable propagation to post-CreateUserProcess
instead of waiting for ResumeThread, which avoids having to get the
thread context (for which we have no other-bitwidth support) to figure
out whether it's the first thread in the process or not.  We bail on
propagation for pre-Vista where we'd have to wait for ResumeThred.

Generalizes the other-bitwidth Visual Studio toolchain environment
variable setting for use in a new build-and-test other-bitwidth test
which builds dynamorio and the large_options client (to ensure options
are propagated to children; and it has convenient init and exit time
prints) for the other bitwidth, arranges parallel lib dirs, and runs
the other client.

Issue: #803, #147, #607, #49
Fixes #803
@derekbruening
Copy link
Contributor Author

We never got as far as running clients: for clients there are issues with static things like DR_REG_X* defines. For that, xref inside DR's IBL generation:

https://github.com/DynamoRIO/dynamorio/blob/master/core/arch/x86/emit_utils.c#L3072

        /* Rather than complicating the REG_X* defines used above we have a post-pass
         * that shrinks all the registers and all the INTPTR immeds.
         * The other two changes we need are performed up above:
         *   1) cmp top bits to 0 for match
         *   2) no trace_cmp entry points
         * Note that we're punting on PR 283152: we go ahead and clobber the top bits
         * of all our scratch registers.
         */

PR 283152 == #822

derekbruening pushed a commit that referenced this issue Jul 13, 2021
Fixes issues around the -inject_x64 prototype option added by PR #4653 for #803 to enable injecting a 64-bit DR into a WOW64 (32-bit) child ("mixed mode").

Xref discussion at https://groups.google.com/g/dynamorio-users/c/rhEpslerwf8

Adds a new option -vmheap_size_wow64 since the default x64 size will not fit in a WOW64 process.
Saves eax register that holds routine address for RtlUserThreadStart before mode switch, and restores it on mode switch.
Fixes far jmp to switch to x64 mode on injection.
Fixes env variable argument propagation.

Example command line that works :

  $ bin64\drrun.exe -inject_x64 -c .\clientdll.dll -- bin64\create_process.exe .\helloworld32.exe

We still need to add proper support on drrun64 to inject natively without having to use create_process.exe.

Issue: #49, #4990
sapostolakis pushed a commit that referenced this issue Jul 14, 2021
Fixes issues around the -inject_x64 prototype option added by PR #4653 for #803 to enable injecting a 64-bit DR into a WOW64 (32-bit) child ("mixed mode").

Xref discussion at https://groups.google.com/g/dynamorio-users/c/rhEpslerwf8

Adds a new option -vmheap_size_wow64 since the default x64 size will not fit in a WOW64 process.
Saves eax register that holds routine address for RtlUserThreadStart before mode switch, and restores it on mode switch.
Fixes far jmp to switch to x64 mode on injection.
Fixes env variable argument propagation.

Example command line that works :

  $ bin64\drrun.exe -inject_x64 -c .\clientdll.dll -- bin64\create_process.exe .\helloworld32.exe

We still need to add proper support on drrun64 to inject natively without having to use create_process.exe.

Issue: #49, #4990
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant