Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#803: Cross-arch Windows injection #4653

Merged
merged 7 commits into from
Jan 5, 2021
Merged

Conversation

derekbruening
Copy link
Contributor

Adds a long-missing feature: following into a Windows child process of
a different bitwidth.

Switches injection from DR and from drinjectlib (including drrun and
drinject) to use -early_inject_map. This was most easily done by
turning on -early_inject by default as well. However, the
-early_inject_location default is INJECT_LOCATION_ImageEntry, which is
the same late takeover point as with thread injection. Switching all
injection over to map-from-the-parent simplifies cross-arch following,
as well as making it easier to shift the takeover point to an earlier
spot in the future. This is a step toward #607 by switching
drinjectlib to use map injection; the takeover point, as mentioned, is
still the image entry.

Adds an -inject_x64 option to inject a 64-bit DR lib into a 32-bit
child from a 64-bit parent, but this option is only sketched out and
is not fully supported yet: #49 covers adding tests and official
support.

Adds library swapping code to find the other-bitwidth library, which
assumes a parallel directory structure. Add a new fatal error if the
library for a child is not found.

To support generating code for all 3 child-parent cases (same-same,
32-64, and 64-32), and in particular for 32-64, switches the small
gencode sequence for -early_inject_map from using IR to using raw
bytes. A multi-arch encoder (#1684) would help but we would need
cross-bitwidth support there, which is not on the horizon. Fixes what
look like bugs in the original gencode generation along the way
(s/pc/cur_local_pos/ and s/local_code_buf/remote_code_buf/): it's not
clear how it worked before.

Adds support for several system calls from a 32-bit parent to a 64-bit
child where the desired NtWow64* system call does not exist. We use
switch_modes_and_call() for NtProtectVirtualMemory and
NtQueryVirtualMemory.

Changes all types in the injection code to handle 64-bit addresses in
32-bit code. Adds UNICODE_STRING_32 and
RTL_USER_PROCESS_PARAMETERS_32 for handling 32-bit structures from
64-bit parents. Similarly, adds RTL_USER_PROCESS_PARAMETERS_64 and
PROCESS_BASIC_INFORMATION64.

Adds get_process_imgname_cmdline() capability for 64-bit remote from 32-bit.

Adds get_remote_proc_address() and uses it to look up
dynamorio_earliest_init_takeover() in a child DR.

Finds the remote ntdll base via a remote query memory walk plus remote
image header parsing. This requires adding a switch_modes_and_call()
version of NtQueryVirtualMemory (also mentioned above), which needs
64-bit args: so we refactor switch_modes_and_call() to take in a
struct of all 64-bit fields for the args.

Fixes a few bugs in other routines to properly get the image name and
image entry for 32-bit children of 64-bit parents.

Updates environment variable propagation code to handle a 32-bit
parent and a 64-bit child. Updates a 64-bit parent and 32-bit child
to insert the variables into the 32-bit PEB (64-bit does no good),
which requires finding the 32-bit PEB. This is done via the 32-bit
TEB, using a hack due to what seems like a kernel bug where it has the
TebBaseAddress 0x2000 too low.

Makes environment variable propagation failures fatal and visible,
unlike previously where errors would just result in silently letting
the child run natively. Turns some other prior soft errors into fatal
errors on child takeover.

Moves environment variable propagation to post-CreateUserProcess
instead of waiting for ResumeThread, which avoids having to get the
thread context (for which we have no other-bitwidth support) to figure
out whether it's the first thread in the process or not. We bail on
propagation for pre-Vista where we'd have to wait for ResumeThred.

Generalizes the other-bitwidth Visual Studio toolchain environment
variable setting for use in a new build-and-test other-bitwidth test
which builds dynamorio and the large_options client (to ensure options
are propagated to children; and it has convenient init and exit time
prints) for the other bitwidth, arranges parallel lib dirs, and runs
the other client

Issue: #803, #147, #607, #49
Fixes #803

Adds a long-missing feature: following into a Windows child process of
a different bitwidth.

Switches injection from DR and from drinjectlib (including drrun and
drinject) to use -early_inject_map.  This was most easily done by
turning on -early_inject by default as well.  However, the
-early_inject_location default is INJECT_LOCATION_ImageEntry, which is
the same late takeover point as with thread injection.  Switching all
injection over to map-from-the-parent simplifies cross-arch following,
as well as making it easier to shift the takeover point to an earlier
spot in the future.  This is a step toward #607 by switching
drinjectlib to use map injection; the takeover point, as mentioned, is
still the image entry.

Adds an -inject_x64 option to inject a 64-bit DR lib into a 32-bit
child from a 64-bit parent, but this option is only sketched out and
is not fully supported yet: #49 covers adding tests and official
support.

Adds library swapping code to find the other-bitwidth library, which
assumes a parallel directory structure.  Add a new fatal error if the
library for a child is not found.

To support generating code for all 3 child-parent cases (same-same,
32-64, and 64-32), and in particular for 32-64, switches the small
gencode sequence for -early_inject_map from using IR to using raw
bytes.  A multi-arch encoder (#1684) would help but we would need
cross-bitwidth support there, which is not on the horizon.  Fixes what
look like bugs in the original gencode generation along the way
(s/pc/cur_local_pos/ and s/local_code_buf/remote_code_buf/): it's not
clear how it worked before.

Adds support for several system calls from a 32-bit parent to a 64-bit
child where the desired NtWow64* system call does not exist.  We use
switch_modes_and_call() for NtProtectVirtualMemory and
NtQueryVirtualMemory.

Changes all types in the injection code to handle 64-bit addresses in
32-bit code.  Adds UNICODE_STRING_32 and
RTL_USER_PROCESS_PARAMETERS_32 for handling 32-bit structures from
64-bit parents.  Similarly, adds RTL_USER_PROCESS_PARAMETERS_64 and
PROCESS_BASIC_INFORMATION64.

Adds get_process_imgname_cmdline() capability for 64-bit remote from 32-bit.

Adds get_remote_proc_address() and uses it to look up
dynamorio_earliest_init_takeover() in a child DR.

Finds the remote ntdll base via a remote query memory walk plus remote
image header parsing.  This requires adding a switch_modes_and_call()
version of NtQueryVirtualMemory (also mentioned above), which needs
64-bit args: so we refactor switch_modes_and_call() to take in a
struct of all 64-bit fields for the args.

Fixes a few bugs in other routines to properly get the image name and
image entry for 32-bit children of 64-bit parents.

Updates environment variable propagation code to handle a 32-bit
parent and a 64-bit child.  Updates a 64-bit parent and 32-bit child
to insert the variables into the 32-bit PEB (64-bit does no good),
which requires finding the 32-bit PEB.  This is done via the 32-bit
TEB, using a hack due to what seems like a kernel bug where it has the
TebBaseAddress 0x2000 too low.

Makes environment variable propagation failures fatal and visible,
unlike previously where errors would just result in silently letting
the child run natively.  Turns some other prior soft errors into fatal
errors on child takeover.

Moves environment variable propagation to post-CreateUserProcess
instead of waiting for ResumeThread, which avoids having to get the
thread context (for which we have no other-bitwidth support) to figure
out whether it's the first thread in the process or not.  We bail on
propagation for pre-Vista where we'd have to wait for ResumeThred.

Generalizes the other-bitwidth Visual Studio toolchain environment
variable setting for use in a new build-and-test other-bitwidth test
which builds dynamorio and the large_options client (to ensure options
are propagated to children; and it has convenient init and exit time
prints) for the other bitwidth, arranges parallel lib dirs, and runs
the other client

Issue: #803, #147, #607, #49
Fixes #803
@derekbruening
Copy link
Contributor Author

The diff is rather large and I was not expecting anyone to be able to do a detailed review as we don't really have any Windows experts as regular contributors anymore, but if anyone wants to take a look please go ahead. Windows support is currently not in a great place and here I'm just trying to fill in some big missing pieces as best-effort improvements.

@johnfxgalea johnfxgalea self-requested a review January 4, 2021 19:22
Copy link
Contributor

@johnfxgalea johnfxgalea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, a very light and quick review, but let me know if there is something particular you want me to have a closer look at.

core/win32/ntdll.h Outdated Show resolved Hide resolved
core/win32/ntdll_shared.c Outdated Show resolved Hide resolved
@derekbruening
Copy link
Contributor Author

The Windows failures all seem to be flaky tests: grrr.

I also see that the sourceforge doxygen link for the older version is suddenly broken, and unfotunately it does not show up as a fatal error: the suite just doesn't build the docs. I guess I will file a new issue. Sigh -- so hard to keep stuff working.

In any case, that's all separate from the changes in this PR. The win32.xarch test is passing beautifully on GA CI.

@derekbruening
Copy link
Contributor Author

Admittedly, a very light and quick review, but let me know if there is something particular you want me to have a closer look at.

Thanks! I guess my biggest concern is that this will break something we don't have good tests for, like AppInit injection (theoretically it shouldn't), or somehow cause issues on some variant of OS and app type (Windows 7, pre-Vista, Windows service, or graphical apps, or user switches, or sthg) that isn't going to show up on the GA CI: but it seems ok to put it in and rely on users filing issues if sthg breaks (and ideally adding new tests...). Given that none of us is able to spend much time on Windows I think all we can do is put it in as best-effort support. It certainly works well on my local machine and as shown here on GA and seems a very good step forward (users have asked for cross-arch injection before; will ask one of them to test it).

@derekbruening
Copy link
Contributor Author

The Windows failures all seem to be flaky tests: grrr.

We have burst_static #4486, vmareas.c assert in broadfun #4077, and win32.dotnet -- wait, that one actually may be a real bug: the image entry takeover point might be a problem there. Good thing we actually have a test for that. Investigating.

…s that never reach the image entry. This actually better matches the prior default thread injection in any case. We obtain the thread start from the context for same-bitwidth children, from ntdll!RtlUserThreadStart in the remote ntdll if not, and if both fail, we fall back to image entry. The thread start has xax as live, so we add a save into an earliest_args_t slot in the gencode, which the init code uses to restore the app value.
@derekbruening
Copy link
Contributor Author

So hooking the image entry for takeover worked on everything except .NET; hooking the thread start fixes .NET but seems to lead to strange instability in a handful of tests, including asserts about thread exit: maybe that hook is hit by early extra threads (though you'd think DR would just take over earlier in that case). The weirdest is tool.cpuid.exe claiming the hook was written but not actually writing it and thus running native: no idea how that is happening, nor how to figure out further w/o walking through the kernel code.

Going to try one more thing before bailing and picking this up at some future point: setting the thread context, like thread injection does.

…d of setting a hook, since a hook there seems to cause weird instability
… context query for the hook and not just the later set
@derekbruening
Copy link
Contributor Author

OK it's green. Just a final tweak which should not change stability.

@derekbruening derekbruening merged commit 9ce2418 into master Jan 5, 2021
@derekbruening derekbruening deleted the i803-xarch-inject branch January 5, 2021 16:10
derekbruening pushed a commit that referenced this pull request Jul 13, 2021
Fixes issues around the -inject_x64 prototype option added by PR #4653 for #803 to enable injecting a 64-bit DR into a WOW64 (32-bit) child ("mixed mode").

Xref discussion at https://groups.google.com/g/dynamorio-users/c/rhEpslerwf8

Adds a new option -vmheap_size_wow64 since the default x64 size will not fit in a WOW64 process.
Saves eax register that holds routine address for RtlUserThreadStart before mode switch, and restores it on mode switch.
Fixes far jmp to switch to x64 mode on injection.
Fixes env variable argument propagation.

Example command line that works :

  $ bin64\drrun.exe -inject_x64 -c .\clientdll.dll -- bin64\create_process.exe .\helloworld32.exe

We still need to add proper support on drrun64 to inject natively without having to use create_process.exe.

Issue: #49, #4990
sapostolakis pushed a commit that referenced this pull request Jul 14, 2021
Fixes issues around the -inject_x64 prototype option added by PR #4653 for #803 to enable injecting a 64-bit DR into a WOW64 (32-bit) child ("mixed mode").

Xref discussion at https://groups.google.com/g/dynamorio-users/c/rhEpslerwf8

Adds a new option -vmheap_size_wow64 since the default x64 size will not fit in a WOW64 process.
Saves eax register that holds routine address for RtlUserThreadStart before mode switch, and restores it on mode switch.
Fixes far jmp to switch to x64 mode on injection.
Fixes env variable argument propagation.

Example command line that works :

  $ bin64\drrun.exe -inject_x64 -c .\clientdll.dll -- bin64\create_process.exe .\helloworld32.exe

We still need to add proper support on drrun64 to inject natively without having to use create_process.exe.

Issue: #49, #4990
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[x64] inject into different-architecture child: x64 to WOW64, WOW64 to x64
2 participants