-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5x performance regression from 3.1 to 5.0 due to generic dictionary CSE #40298
Comments
Tagging subscribers to this area: @eiriktsarpalis |
We may need to make sure that the stale dictionary copies always get initialized as well to fix this. |
I have made a first pass over Sergey's and Fadi's changes and over Fadi's description of the dictionary expansion algorithm. @jkotas, am I right to understand your suggestion so that, whenever we expand a given dictionary, we should retain something like a pointer to its previous (smaller) version so that, upon subsequent slot resolution we should back-propagate the resolved function pointer to all earlier versions of the dictionary? |
Yes, I think that's the best way to fix it. |
I am guessing we need to add a perf benchmark here to ensure this doesnt regress again, unless System.Collections already has one for this scenario? |
I am pretty sure that it does not. This is startup regression. All our feature-specific benchmarks are testing steady state throughput. Similar regression can be be hit by any generic method with high-enough complexity. The pattern that triggered it in the original repro was use of Linq that is known to produce complex generics-heavy code. |
@jkotas - I have stood up an initial implementation of the linked lists as discussed above, no visible perf change. I subsequently added some counters and a ring buffer only to find out that, during the repro test mentioned on top of this issue, we keep calling m_pszDebugMethodName: Test.TestDict[System.String](System.Collections.Generic.IDictionary`2<System.String,Boolean>, System.Func`2<Int32,System.String>) We do this on the following call stack: > coreclr.dll!JIT_GenericHandleWorker(MethodDesc * pMD, MethodTable * pMT, void * signature, unsigned long dictionaryIndexAndSlot, Module * pModule) Line 3322 C++ coreclr.dll!JIT_GenericHandle_Framed(CORINFO_CLASS_STRUCT_ * classHnd, CORINFO_METHOD_STRUCT_ * methodHnd, void * signature, unsigned long dictionaryIndexAndSlot, CORINFO_MODULE_STRUCT_ * moduleHnd) Line 3350 C++ coreclr.dll!JIT_GenericHandleMethod(CORINFO_METHOD_STRUCT_ * methodHnd, void * signature) Line 3379 C++ 00007ff7c781a8c9() Unknown 00007ff7c7819087() Unknown coreclr.dll!CallDescrWorkerInternal() Line 100 Unknown coreclr.dll!MethodDescCallSite::CallTargetWorker(const unsigned __int64 * pArguments, unsigned __int64 * pReturnValue, int cbReturnValue) Line 552 C++ [Inline Frame] coreclr.dll!MethodDescCallSite::Call_RetArgSlot(const unsigned __int64 *) Line 458 C++ coreclr.dll!RunMainInternal(Param * pParam) Line 1455 C++ Even though we update the generic dictionary slot in PopulateEntry, we always end in the method again. As you're much more familiar with this logic than I am, I'd be grateful for any suggestions how to continue the investigation to get to the bottom of this - where should we pick up the dictionary slot value instead of trying to resolve it again? At one point I was speculating whether tiered compilation may not be affecting this in some manner but I received exactly the same results after turning it off. Thanks a lot Tomas |
Could you please share your current delta? This is the key piece of code from the rcx points to the old dictionary here.
|
See repro at #38660 (comment)
This is regression introduced by the growable generic dictionary feature (#32270). The lazy initialization of the generic dictionary can reallocate it and the JITed code may be running with the stale copy of the dictionary that is never going to be initialized.
The text was updated successfully, but these errors were encountered: