Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DR crashes when running C# on Win 10 #3046

Closed
fmoessbauer opened this issue Jun 13, 2018 · 8 comments
Closed

DR crashes when running C# on Win 10 #3046

fmoessbauer opened this issue Jun 13, 2018 · 8 comments

Comments

@fmoessbauer
Copy link
Contributor

fmoessbauer commented Jun 13, 2018

Dynamorio crashes when running a simple C# Application on Windows 10:

OS: Windows 10
DR: drrun version 7.0.0 -- build 1 (both 32 and 64 bit)
Compiler: MSBuild\15.0\Bin\Roslyn\csc.exe

C# Code:

class MainProgram
{
  static void Main(string[] args)
  { }
}

Execution:

/DynamoRIO-Windows-7.0.0-RC1/bin32/drrun.exe -debug -- test/mini-apps/cs-lock/Debug/gp-cs-lock.exe
<Application C:\Users\felix\CMakeBuilds\b76e3c80-c08f-0039-98b4-7033c70ea08f\build\x86-Debug-msvc\test\mini-apps\cs-lock\Debug\gp-cs-lock.exe (10788).  Internal Error: DynamoRIO debug check failure: D:\dynamorio_package\core\win32\ntdll.c:662 (byte *) get_proc_address(ntdllh, syscall_names[i]) != NULL && (*((int *)(((byte *) get_proc_address(ntdllh, syscall_names[i])) + SYSNUM_OFFS)) == syscalls[i] || ALLOW_HOOKER((byte *) get_proc_add
version 7.0.0, build 1

0x00b3f00c 0x143d7e84
0x00b3f144 0x145bf2a0
0x00b3f1c4 0x1431c07f
0x00b3fa88 0x14568d51
0x00b3fae8 0x14569718>

Edit: When using, drrun version 7.0.17689 -- build 0 the output is more precise:

<Starting application C:\Users\felix\CMakeBuilds\b76e3c80-c08f-0039-98b4-7033c70ea08f\build\x64-Debug-msvc\test\mini-apps\cs-lock\Debug\gp-cs-lock.exe (7260)>
<Early threads found>
<Initial options = -no_dynamic_options -code_api -probe_api -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct -no_aslr_dr -pad_jmps_mark_no_trace >
<intercept_syscall_wrapper: not hooking NtSetContextThread due to conflict @0x77b934f5>
<intercept_syscall_wrapper: not hooking NtGetContextThread due to conflict @0x77b92ba5>
<intercept_syscall_wrapper: not hooking NtTerminateProcess due to conflict @0x77b91fd5>
<intercept_syscall_wrapper: not hooking NtCreateThread due to conflict @0x77b921f5>
<intercept_syscall_wrapper: not hooking NtCreateThreadEx due to conflict @0x77b928a5>
<intercept_syscall_wrapper: not hooking NtResumeThread due to conflict @0x77b92235>
<intercept_syscall_wrapper: not hooking NtQueryInformationThread due to conflict @0x77b91f65>
<intercept_syscall_wrapper: not hooking NtProtectVirtualMemory due to conflict @0x77b92215>
<intercept_syscall_wrapper: not hooking NtMapViewOfSection due to conflict @0x77b91f95>
<intercept_syscall_wrapper: not hooking NtUnmapViewOfSection due to conflict @0x77b91fb5>
<Stopping application C:\Users\felix\CMakeBuilds\b76e3c80-c08f-0039-98b4-7033c70ea08f\build\x64-Debug-msvc\test\mini-apps\cs-lock\Debug\gp-cs-lock.exe (7260)>
Segmentation fault
@toshipiazza
Copy link
Contributor

I'm not too knowledgeable about C# but I thought that by default, the C# compiler will emit bytecode for CLR as opposed to native code. I doubt that DynamoRIO is expected to work on this non-native code...

@fmoessbauer
Copy link
Contributor Author

You are right regarding the C# compiler, but as the JIT is included in the binary along with the CIL (~Bytecode), from an external view there is no observable difference.

Interestingly the code works on Windows7 using the latest DR, so I guess the actual issue is the Windows API?

@derekbruening
Copy link
Contributor

"Windows 10" is not precise enough (from DR's point of view it's like saying "Windows prior to 10" as each Windows 10 update is a major update, adding system calls and changing system call numbers). Which release of 10 is this?

DynamoRIO-Windows-7.0.0-RC1 has no support for Win10 TH2 (1511) or higher. The core\win32\ntdll.c:662 assert listed above is known and is a duplicate of the general TH2 issue #1825. It was fixed a while back. Please use a more recent version of DR (one of the "cronbuilds").

With the more recent version 7.0.17689, it looks like the app executed fine (though it seems to hook a lot of syscall wrappers -- or maybe that's some AV software on your system? Best to rule that out as the exit crash could come from that too) until it exited. A callstack of the exit crash (from windbg) would help to figure out what's going on.

@fmoessbauer
Copy link
Contributor Author

Hi, thanks for your quick response. The exact windows release is:

Edition: Windows 10 Enterprise
Version: 1703
Build: 15063.1088

Regarding the AV: That is possible as other (fine-running) apps show similar conflicts. When running on a similar system without AV, there are no hooking conflics, but the app crashes as well.

After I have installed windbg (apparently not included in the Win10 SDK Debugging Tools), I will report back.

@fmoessbauer
Copy link
Contributor Author

The issue also occurs for a windows 7 build. The application is a dummy C# application (x64), using CLR dotnet.
I have tested the following dotnet versions, and all crash the same way:

  • Framework 4.5.2
  • Framework 4.6.1
  • Framework 4.7.2

Here is a crash log:

<Starting application C:\Users\felix\Documents\Visual Studio 2015\Projects\DynRIOTest1\bin\Release\DynRIOTest1.exe (14608)>
<Initial options = -no_dynamic_options -code_api -probe_api -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct -pad_jmps_mark_no_trace >
<non-syscall, non-int2b 0x21 @ 0x000000013fb90047 from 0x000000013fb90000>
<Invalid opcode encountered>
<Stopping application C:\Users\felix\Documents\Visual Studio 2015\Projects\DynRIOTest1\bin\Release\DynRIOTest1.exe (14608)>
Segmentation fault

windbg output at crash:

************* Symbol Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       srv*
Symbol search path is: srv*
Executable search path is: 
ModLoad: 00000001`3f1f0000 00000001`3f1f6000   C:\Users\felix\Documents\Visual Studio 2015\Projects\DynRIOTest1\bin\Release\DynRIOTest1.exe
ModLoad: 00000000`76d30000 00000000`76ecf000   C:\WINDOWS\SYSTEM32\ntdll.dll
ModLoad: 000007fe`fa250000 000007fe`fa2bf000   C:\WINDOWS\SYSTEM32\MSCOREE.DLL
ModLoad: 00000000`76a10000 00000000`76b2f000   C:\WINDOWS\system32\KERNEL32.dll
ModLoad: 000007fe`fcab0000 000007fe`fcb1a000   C:\WINDOWS\system32\KERNELBASE.dll
ModLoad: 00000000`15000000 00000000`1554f000   C:\opt\DynamoRIO-Windows-7.0.17744-0\lib64\debug\dynamorio.dll
(25e8.3a6c): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
00000000`76d9b1d0 cc              int     3
0:001> g
(25e8.2b98): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
00000000`bf27109f 0003            add     byte ptr [rbx],al ds:00000000`00000000=??
0:000> kb
 # RetAddr           : Args to Child                                                           : Call Site
00 00000000`0015c980 : 00000000`00000000 00000000`00060000 00000000`00000080 445c7470`6f5c3a43 : 0xbf27109f
01 00000000`00000000 : 00000000`00060000 00000000`00000080 445c7470`6f5c3a43 00000000`76d8385d : 0x15c980 

@derekbruening
Copy link
Contributor

See a similar issue and discussion on the users list: https://groups.google.com/forum/#!topic/DynamoRIO-Users/RXXRxYa08QQ

Resuming that here with more details:

These .NET apps have an entry point offset of 0, just pointing at the base of the binary.

What is happening is that there is special support in the win10 (maybe win7+?) loader to change the application's start address in the initial CONTEXT (Rcx there) from the executable entry point, set by the kernel, to MSCOREE!CorExeMain_Exported. In old (pre-win7 maybe) Windows versions, instead mscoree itself would patch the entry point, and not change the initial CONTEXT at all. The drrun injector is rather late, and it uses the CONTEXT seen from the parent and caches it. It thus ignores this change by the loader and tries to run the start address set by the kernel, resulting in the observed crashes.

If you use from-parent injection with location "4", the app should run correctly as DR injects before the special code that changes the start point. This requires DR being in a parent process just because we never implemented early injection support in drrun (#607):

bin64/drrun -early_inject -early_inject_map -early_inject_location 4 -- bin64/create_process d:/derek/dr/test/dotnet64.exe

However, injecting that early does not have good client support if the client wants to use Windows libraries, since loading a private kernelbase that early just doesn't work. We have a half-implemented "drwinapi" trying to solve that but it was never finished enough for most clients.

The simplest, targeted short-term fix would be to change the late injection code to update the cached context with the new Rcx/Eax value. Long-term of course it would be great to move default injection earlier via a filled-in drwinapi.

It would also be nice to add a .NET test to the regression suite to catch regressions like this.

@derekbruening derekbruening self-assigned this Mar 19, 2019
derekbruening added a commit that referenced this issue Mar 19, 2019
On recent Windows versions, the loader changes the start address of
the initial thread's CONTEXT.  However, for late injection, DR had
already cached the address to the value set by the kernel, which
crashes.  We solve that by updating the start address register when we
run our takeover code.

Adds a .NET test to the suite.

Fixes #3046
derekbruening added a commit that referenced this issue Mar 19, 2019
On recent Windows versions, the loader changes the start address of
the initial thread's CONTEXT.  However, for late injection, DR had
already cached the address to the value set by the kernel, which
crashes.  We solve that by updating the start address register when we
run our takeover code.

Adds a .NET test to the suite.

Fixes #3046
Carrotman42 pushed a commit that referenced this issue Mar 20, 2019
On recent Windows versions, the loader changes the start address of
the initial thread's CONTEXT.  However, for late injection, DR had
already cached the address to the value set by the kernel, which
crashes.  We solve that by updating the start address register when we
run our takeover code.

Adds a .NET test to the suite.

Fixes #3046
hgreving2304 pushed a commit that referenced this issue Mar 20, 2019
On recent Windows versions, the loader changes the start address of
the initial thread's CONTEXT.  However, for late injection, DR had
already cached the address to the value set by the kernel, which
crashes.  We solve that by updating the start address register when we
run our takeover code.

Adds a .NET test to the suite.

Fixes #3046
@n0tduck1e
Copy link

Hey @derekbruening

i was reading this and what does early_inject_location mean ?

@derekbruening
Copy link
Contributor

Hey @derekbruening

i was reading this and what does early_inject_location mean ?

If you search the code you'll see the option in optionsx.h referencing the enum of values like INJECT_LOCATION_ThreadStart: see the comments there for the different targets when using hooking to take over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants