Jumpstub fixes

- Reserve space for jump stubs for precodes and other code fragments at the end of each code heap segment. This is trying to ensure that eventual allocation of jump stubs for precodes and other code fragments succeeds. Accounting is done conservatively - reserves more than strictly required. It wastes a bit of address space, but no actual memory. Also, this reserve is not used to allocate jump stubs for JITed code since the JITing can recover from failure to allocate the jump stub now. Fixes #14996. - Improve algorithm to reuse HostCodeHeap segments: Maintain estimated size of the largest free block in HostCodeHeap. This estimate is updated when allocation request fails, and also when memory is returned to the HostCodeHeap. Fixes #14995. - Retry JITing on failure to allocate jump stub. Failure to allocate jump during JITing is not fatal anymore. There is extra memory reserved for jump stubs on retry to ensure that the retry succeeds allocating the jump stubs that it needs with high probability. - Respect CodeHeapRequestInfo::getRequestSize for HostCodeHeap. CodeHeapRequestInfo::getRequestSize is used to throttle code heap segment size for large workloads. Not respecting it in HostCodeHeap lead to too many too small code heap segments in large workloads. - Switch HostCodeHeap nibble map to be allocated on regular heap as part. It simplied the math required to estimate the nibble map size, and allocating on regular heap is overall goodness since it does not need to be executable.
dotnet · Dec 1, 2017 · 5d0a5a8 · 5d0a5a8
1 parent 1344d8e
commit 5d0a5a8
Show file tree

Hide file tree

Showing 14 changed files with 425 additions and 460 deletions.
diff --git a/Documentation/design-docs/jump-stubs.md b/Documentation/design-docs/jump-stubs.md
@@ -188,8 +188,9 @@ still reach their intended target with a rel32 offset, so jump stubs are
 not expected to be required in most cases.
 
 If this attempt to create a jump stub fails, then the generated code
-cannot be used, and we hit a fatal error; we have no mechanism currently
-to recover from this failure, or to prevent it.
+cannot be used, and the VM restarts the compilation with reserving
+extra space in the code heap for jump stubs. The reserved extra space
+ensures that the retry succeeds with high probability.
 
 There are several problems with this system:
 1. Because the VM doesn't know whether a `IMAGE_REL_BASED_REL32`
@@ -205,8 +206,6 @@ code because the JIT generates `IMAGE_REL_BASED_REL32` relocations for
 intra-function jumps and calls that it expects and, in fact, requires,
 not be replaced with jump stubs, because it doesn't expect the register
 used by jump stubs (RAX) to be trashed.
-3. We don't have any mechanism to recover if a jump stub can't be
-allocated.
 
 In the NGEN case, rel32 calls are guaranteed to always reach, as PE
 image files are limited to 2GB in size, meaning a rel32 offset is
@@ -217,8 +216,8 @@ jump stubs, as described later.
 
 ### Failure mitigation
 
-There are several possible mitigations for JIT failure to allocate jump
-stubs.
+There are several possible alternative mitigations for JIT failure to 
+allocate jump stubs.
 1. When we get into "rel32 overflow" mode, the JIT could always generate
 large calls, and never generate rel32 offsets. This is obviously
 somewhat expensive, as every external call, such as every call to a JIT
@@ -469,19 +468,9 @@ bytes allocated, to reserve space for one jump stub per FixupPrecode in
 the chunk. When the FixupPrecode is patched, for LCG methods it will use
 the pre-allocated space if a jump stub is required.
 
-For the non-LCG, non-FixupPrecode cases, we need a different solution.
-It would be easy to similarly allocate additional space for each type of
-precode with the precode itself. This might prove expensive. An
-alternative would be to ensure, by design, that somehow shared jump stub
-space is available, perhaps by reserving it in a shared area when the
-precode is allocated, and falling back to a mechanism where the precode
-reserves its own jump stub space if shared jump stub space cannot be
-allocated.
-
-A possibly better implementation would be to reserve, but not allocate,
-jump stub space at the end of the code heap, similar to how
-CodeHeapReserveForJumpStubs works, but instead the reserve amount should
-be computed precisely.
+For non-LCG, we are reserving, but not allocating, a space at the end
+of the code heap. This is similar and in addition to the reservation done by
+COMPlus_CodeHeapReserveForJumpStubs. (See https://github.com/dotnet/coreclr/pull/15296).
 
 ## Ready2Run
 

diff --git a/src/debug/daccess/fntableaccess.h b/src/debug/daccess/fntableaccess.h
@@ -41,9 +41,7 @@ struct FakeHeapList
     DWORD_PTR           mapBase;        // changed from PBYTE
     DWORD_PTR           pHdrMap;        // changed from DWORD*
     size_t              maxCodeHeapSize;
-    DWORD               cBlocks;
-    bool                bFull;          // Heap is considered full do not use for new allocations
-    bool                bFullForJumpStubs; // Heap is considered full do not use for new allocations of jump stubs
+    size_t              reserveForJumpStubs;
 };
 
 typedef struct _FakeHpRealCodeHdr

diff --git a/src/inc/clrconfigvalues.h b/src/inc/clrconfigvalues.h
@@ -599,7 +599,7 @@ RETAIL_CONFIG_STRING_INFO(INTERNAL_WinMDPath, W("WinMDPath"), "Path for Windows
 // Loader heap
 // 
 CONFIG_DWORD_INFO_EX(INTERNAL_LoaderHeapCallTracing, W("LoaderHeapCallTracing"), 0, "Loader heap troubleshooting", CLRConfig::REGUTIL_default)
-RETAIL_CONFIG_DWORD_INFO(INTERNAL_CodeHeapReserveForJumpStubs, W("CodeHeapReserveForJumpStubs"), 2, "Percentage of code heap to reserve for jump stubs")
+RETAIL_CONFIG_DWORD_INFO(INTERNAL_CodeHeapReserveForJumpStubs, W("CodeHeapReserveForJumpStubs"), 1, "Percentage of code heap to reserve for jump stubs")
 RETAIL_CONFIG_DWORD_INFO(INTERNAL_NGenReserveForJumpStubs, W("NGenReserveForJumpStubs"), 0, "Percentage of ngen image size to reserve for jump stubs")
 RETAIL_CONFIG_DWORD_INFO(INTERNAL_BreakOnOutOfMemoryWithinRange, W("BreakOnOutOfMemoryWithinRange"), 0, "Break before out of memory within range exception is thrown")
 

diff --git a/src/inc/loaderheap.h b/src/inc/loaderheap.h
@@ -417,7 +417,7 @@ class UnlockedLoaderHeap
 #endif
 
 protected:
-    void *UnlockedAllocMemForCode_NoThrow(size_t dwHeaderSize, size_t dwCodeSize, DWORD dwCodeAlignment);
+    void *UnlockedAllocMemForCode_NoThrow(size_t dwHeaderSize, size_t dwCodeSize, DWORD dwCodeAlignment, size_t dwReserveForJumpStubs);
 
     void UnlockedSetReservedRegion(BYTE* dwReservedRegionAddress, SIZE_T dwReservedRegionSize, BOOL fReleaseMemory);
 };
@@ -838,10 +838,10 @@ class ExplicitControlLoaderHeap : public UnlockedLoaderHeap
 
 
 public:
-    void *AllocMemForCode_NoThrow(size_t dwHeaderSize, size_t dwCodeSize, DWORD dwCodeAlignment)
+    void *AllocMemForCode_NoThrow(size_t dwHeaderSize, size_t dwCodeSize, DWORD dwCodeAlignment, size_t dwReserveForJumpStubs)
     {
         WRAPPER_NO_CONTRACT;
-        return UnlockedAllocMemForCode_NoThrow(dwHeaderSize, dwCodeSize, dwCodeAlignment);
+        return UnlockedAllocMemForCode_NoThrow(dwHeaderSize, dwCodeSize, dwCodeAlignment, dwReserveForJumpStubs);
     }
 
     void SetReservedRegion(BYTE* dwReservedRegionAddress, SIZE_T dwReservedRegionSize, BOOL fReleaseMemory)

diff --git a/src/utilcode/loaderheap.cpp b/src/utilcode/loaderheap.cpp
@@ -1731,7 +1731,7 @@ void *UnlockedLoaderHeap::UnlockedAllocAlignedMem(size_t  dwRequestedSize,
 
 
 
-void *UnlockedLoaderHeap::UnlockedAllocMemForCode_NoThrow(size_t dwHeaderSize, size_t dwCodeSize, DWORD dwCodeAlignment)
+void *UnlockedLoaderHeap::UnlockedAllocMemForCode_NoThrow(size_t dwHeaderSize, size_t dwCodeSize, DWORD dwCodeAlignment, size_t dwReserveForJumpStubs)
 {
     CONTRACT(void*)
     {
@@ -1753,7 +1753,7 @@ void *UnlockedLoaderHeap::UnlockedAllocMemForCode_NoThrow(size_t dwHeaderSize, s
     //
     // Thus, we'll request as much heap growth as is needed for the worst case (we request an extra dwCodeAlignment - 1 bytes)
 
-    S_SIZE_T cbAllocSize = S_SIZE_T(dwHeaderSize) + S_SIZE_T(dwCodeSize) + S_SIZE_T(dwCodeAlignment - 1);
+    S_SIZE_T cbAllocSize = S_SIZE_T(dwHeaderSize) + S_SIZE_T(dwCodeSize) + S_SIZE_T(dwCodeAlignment - 1) + S_SIZE_T(dwReserveForJumpStubs);
     if( cbAllocSize.IsOverflow() )
     {
         RETURN NULL;

diff --git a/src/vm/amd64/cgenamd64.cpp b/src/vm/amd64/cgenamd64.cpp
@@ -692,7 +692,8 @@ UMEntryThunk* UMEntryThunk::Decode(LPVOID pCallback)
     return (UMEntryThunk*)pThunkCode->m_uet;
 }
 
-INT32 rel32UsingJumpStub(INT32 UNALIGNED * pRel32, PCODE target, MethodDesc *pMethod, LoaderAllocator *pLoaderAllocator /* = NULL */)
+INT32 rel32UsingJumpStub(INT32 UNALIGNED * pRel32, PCODE target, MethodDesc *pMethod, 
+    LoaderAllocator *pLoaderAllocator /* = NULL */, bool throwOnOutOfMemoryWithinRange /*= true*/)
 {
     CONTRACTL
     {
@@ -721,11 +722,31 @@ INT32 rel32UsingJumpStub(INT32 UNALIGNED * pRel32, PCODE target, MethodDesc *pMe
         TADDR hiAddr = baseAddr + INT32_MAX;
         if (hiAddr < baseAddr) hiAddr = UINT64_MAX; // overflow
 
+        // Always try to allocate with throwOnOutOfMemoryWithinRange:false first to conserve reserveForJumpStubs until when
+        // it is really needed. LoaderCodeHeap::CreateCodeHeap and EEJitManager::CanUseCodeHeap won't use the reserved 
+        // space when throwOnOutOfMemoryWithinRange is false.
+        //
+        // The reserved space should be only used by jump stubs for precodes and other similar code fragments. It should
+        // not be used by JITed code. And since the accounting of the reserved space is not precise, we are conservative
+        // and try to save the reserved space until it is really needed to avoid throwing out of memory within range exception.
         PCODE jumpStubAddr = ExecutionManager::jumpStub(pMethod,
                                                         target,
                                                         (BYTE *)loAddr,
                                                         (BYTE *)hiAddr,
-                                                        pLoaderAllocator);
+                                                        pLoaderAllocator,
+                                                        /* throwOnOutOfMemoryWithinRange */ false);
+        if (jumpStubAddr == NULL)
+        {
+            if (!throwOnOutOfMemoryWithinRange)
+                return 0;
+
+            jumpStubAddr = ExecutionManager::jumpStub(pMethod,
+                target,
+                (BYTE *)loAddr,
+                (BYTE *)hiAddr,
+                pLoaderAllocator,
+                /* throwOnOutOfMemoryWithinRange */ true);
+        }
 
         offset = jumpStubAddr - baseAddr;
 

diff --git a/src/vm/amd64/cgencpu.h b/src/vm/amd64/cgencpu.h
@@ -379,7 +379,8 @@ void EncodeLoadAndJumpThunk (LPBYTE pBuffer, LPVOID pv, LPVOID pTarget);
 
 
 // Get Rel32 destination, emit jumpStub if necessary
-INT32 rel32UsingJumpStub(INT32 UNALIGNED * pRel32, PCODE target, MethodDesc *pMethod, LoaderAllocator *pLoaderAllocator = NULL);
+INT32 rel32UsingJumpStub(INT32 UNALIGNED * pRel32, PCODE target, MethodDesc *pMethod, 
+    LoaderAllocator *pLoaderAllocator = NULL, bool throwOnOutOfMemoryWithinRange = true);
 
 // Get Rel32 destination, emit jumpStub if necessary into a preallocated location
 INT32 rel32UsingPreallocatedJumpStub(INT32 UNALIGNED * pRel32, PCODE target, PCODE jumpStubAddr);