Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM with STATUS_INVALID_SYSTEM_SERVICE due to wrong NtAllocateVirtualMemory syscall # from CrowdStrike Falcon hook #7024

Open
derekbruening opened this issue Oct 7, 2024 · 0 comments

Comments

@derekbruening
Copy link
Contributor

A number of users of Dr. Memory have complained about startup failures on Windows Enterprise. Since Dr. Memory prints a list of libraries it sees in its launcher that are not on an (old-ish) known library list as possible interoperability causes (since security software often fails to work well with low-level tools like ours), Dr. Memory was printing out that "bcrypt.dll" might be the culprit and so many users put that in the issue titles or used that to refer to the problem.

E.g.: https://groups.google.com/g/drmemory-users/c/9wM120ukp6c

I managed to reproduce this and the external symptoms are like this:

---------------------------
DynamoRIO Notice: C:\Program Files\WindowsApps\Microsoft.WindowsNotepad_11.2407.9.0_x64__8wekyb3d8bbwe\Notepad\Notepad.exe(4584)
---------------------------
Application C:\Program Files\WindowsApps\Microsoft.WindowsNotepad_11.2407.9.0_x64__8wekyb3d8bbwe\Notepad\Notepad.exe (4584).  Out of memory.  Program aborted.  Source I, type 0x0000000000000001, code 0x00000000c000001c.
---------------------------
OK   
---------------------------

Debug build hits this assert on failing to allocate:

SYSLOG_ERROR: Application C:\Program Files\WindowsApps\Microsoft.WindowsNotepad_11.2407.9.0_x64__8wekyb3d8bbwe\Notepad\Notepad.exe (10560).  Internal Error: DynamoRIO debug check failure: D:\a\drmemory\drmemory\dynamorio\core\win32\os.c:4982 false

The failure code is usually this one:

#define STATUS_INVALID_SYSTEM_SERVICE    ((NTSTATUS)0xC000001CL)

Though I have seen it vary: I've seen 0x80000002, 0xc0000005, and others.

This seems to be the same as DynamoRIO/drmemory#2447.

On looking deeper we see:

  • This is not from DEP: DEP is opt-in on this machine; plus a standalone test app successfully allocates +rwx memory
  • This is not some security policy disallowing raw syscalls: a standalone test app can call NtAllocateVirtualMemory in assembly

The problem is that DR gets the wrong system call number for NtAllocateVirtualMemory:

syscalls_init: enum=15 name=NtAllocateVirtualMemory wrapper=0x00007ffdcfd10420 hook=0 #=0x8eb4a
bytes @:
 4c 8b d1 e9 4a eb 08 00 f6 04 25 08 03 fe 7f 01 75 03 0f 05 c3 cd 2e c3 0f 1f 84 00 00 00

The number should be 0x18:

  00000001800A0420: 4C 8B D1           mov         r10,rcx
  00000001800A0423: B8 18 00 00 00     mov         eax,18h
  00000001800A0428: F6 04 25 08 03 FE  test        byte ptr [000000007FFE0308h],1
                    7F 01
  00000001800A0430: 75 03              jne         00000001800A0435
  00000001800A0432: 0F 05              syscall
  00000001800A0434: C3                 ret
  00000001800A0435: CD 2E              int         2Eh
  00000001800A0437: C3                 ret
  00000001800A0438: 0F 1F 84 00 00 00  nop         dword ptr [rax+rax+0000000000000000h]
                    00 00

There's a hook in there clobbering the number:

(gdb) x/30i 0x04311420
   0x4311420:   mov    %rcx,%r10
   0x4311423:   jmpq   0x439ff72
   0x4311428:   testb  $0x1,0x7ffe0308
   0x4311430:   jne    0x4311435
   0x4311432:   syscall
   0x4311434:   retq
   0x4311435:   int    $0x2e
   0x4311437:   retq
   0x4311438:   nopl   0x0(%rax,%rax,1)

DR tries to detect a hook during number acquisition but it only looks at the entry pc: so this deeper hook is not caught there.

I don't have a debugger in my test machine: but I do see that DrMemory complains about c:\windows\system32\umppc18613.dll in addition to bcrypt.dll. And umppc18613.dll is CrowdStrike Falcon Sensor Support Module which is almost certainly the culprit doing this hooking.

Action items:

  • Address regression making popup messages silent by default: one reason external users had so much trouble figuring this out; things just failed silently
  • Add bcrypt.dll to the known list: done in i#2498 bcrypt init: Add bcrypt.dll to allowlist drmemory#2516
  • Figure out how to handle Windows syscall numbers going forward

For that last one this comment in the code lists the options:

/* XXX i#2713: With the frequent major win10 updates, adding new tables here is
 * getting tedious and taking up space.  Should we stop adding the win10 updates here
 * and give up on our table of numbers, relying on reading the wrappers (i#1598
 * changed DR to work purely on wrapper-obtained numbers)?  We'd lose robustness vs
 * hooks, and clients like Dr. Memory who have to distinguish win10 versions would
 * have to do their own versioning.  I guess we could still have
 * DR_WINDOWS_VERSION_xx and not have corresponding tables here.  Or we could go the
 * planned Dr. Memory route (DrMi#1848) and store these numbers in a separate file
 * that is updated via a separate standalone utility run once by the user.
 */
SYS_CONST int windows_10_1803_x64_syscalls[TRAMPOLINE_MAX] = {

If we had detected the hook, we would have fallen back to the windows_10_1803_x64_syscalls numbers: which do have NtAllocateVirtualMemory correct but may have others wrong.

Trying to make the options more concise here for the numbers:

  • Just add deep hook detection of invalid number detection (tough on wow64 w/ prefixes) and hope our old hardcoded numbers work
  • The prior item, plus add the most recent number snapshot
  • Adopt Dr. Memory's separate file and utility approach
@derekbruening derekbruening self-assigned this Oct 7, 2024
derekbruening added a commit that referenced this issue Oct 8, 2024
Adds the use of "-vm_base 0" to mean to let the OS pick our vmcode
base, with no random offset applied.  This was useful during
diagnosing the OOM from the CrowdStrike hook.

Issue: #7024
derekbruening added a commit that referenced this issue Oct 9, 2024
Replaces the AUTOMATED_TESTING set in package builds by PR #5769 with a
new DISABLE_ZLIB CMake option to accomplish the goal of disabling the
zlib found on these VMs while avoiding turning off msgbox_mask. The
disabling by default of msgbox_mask in packages has caused many users to
fail to obtain error information and has led to confusion with silent
errors.

Tested: built a cronbuild based on this branch and downloaded it to the
Windows VM used for #7024.
I ran this:
```
C:\Users\bruening\DynamoRIO-Windows-10.93.20004>bin64\drrun -- msg
<Application C:\Windows\system32\msg.exe (3892).  Out of memory.  Program aborted.  Source I, type 0x0000000000000001, code 0x00000000c000001c.>
```

And confirmed a messagebox popped up: thus showing that `-msgbox_mask`
is *not* set to 0 anymore.

Issue: #7025
derekbruening added a commit to DynamoRIO/drmemory that referenced this issue Oct 9, 2024
Updates DR to
[a15656a0c](36a6d23).
Replaces the AUTOMATED_TESTING set in package builds by PR #2474 with
the new DISABLE_ZLIB CMake option added by
DynamoRIO/dynamorio#7030. This fixes a regression where msgbox_mask was
set to 0 by default in packages, which caused many users to fail to
obtain error information and has led to confusion with silent errors.

Tested:
Built
https://github.com/DynamoRIO/drmemory/releases/tag/cronbuild-2.6.20005,
unzipped it, and confirmed it pops up a message box by default on the
machine where DynamoRIO/dynamorio#7024 is hit.

Fixes DynamoRIO/dynamorio#7025
derekbruening added a commit that referenced this issue Oct 9, 2024
Adds the use of "-vm_base 0" to mean to let the OS pick our vmcode base,
with no random offset applied. This was useful during diagnosing the OOM
from the CrowdStrike hook.

Issue: #7024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant