-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExecutionManager::GetRangeSection performs a linear search over list of loaded assemblies #8393
Comments
@vancem Issue filed |
@jkotas FYI, these delegate lookups cost us 3% of execution time in MusicStore (when everything is crossgened) because we have loaded over 140 managed assemblies (native imgaes) and the linear search becomes a problem. |
It looks like in CoreCLR the first range lookup is actually no longer needed. In desktop this lookup feeds secure delegate creation. That use has been removed in Core, but the lookup remains. So maybe we can get half of this perf back very cheaply. void* pRetAddr = _ReturnAddress();
pCreatorMethod = ExecutionManager::GetCodeMethodDesc((PCODE)pRetAddr);
// pCreatorMethod unused after this |
Remove a range lookup that's no longer needed. See related issue #12438.
Remove a range lookup that's no longer needed. See related issue #12438.
Might be more we could do here but the Range data structure is fundamental so it would require careful thought. Moving to future. |
Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process. This process is part of our issue cleanup automation. |
This should stay open. |
I'm looking at a fix for this. |
…to a MethodDesc (#79021) Mapping from an instruction pointer to a MethodDesc is a critical component of the runtime used in many places, notably diagnostics, slow path delegate creation, and stackwalking. This change provides a dramatic improvement in the performance of that logic through several techniques. 1. The mapping from IP to range of interesting memory is done via a tree structure resembling that of a page table. The result of this is that in addition to reducing the locking logic, the cost of looking up in the presence of many different loaded assemblies will be reduced to a nearly constant time. 2. That tree structure is configured so that when accessing memory which will never be freed in the lifetime of the process, the memory can be accessed without taking any locks whatsoever. 3. The code map is enhanced to include not only the code generated by the JIT/R2R code, but also to include the fixup and stub precode stubs. 4. In addition, performance improvement was done to improve the performance of slow path delegate creation in particular, by reordering which checks are done, and by writing a simplified signature parse routine for computing the number of arguments to a function. Performance of this was tested in both the EH stackwalking scenario as well as the delegate slow path creation scenario, but the delegate logic is what will be most visible when this is checked in. (See PR #79471 for details of the additional changes necessary to take advantage of this work when doing EH) (#8393 describes the potential product impact of just improving the delegate slow path creation code) For the delegate creation scenario the perf impact was measured to be quite substantial. These numbers are in ms to create a constant number of delegates on each thread used. Smaller is better. The test was run on a 6 core machine. | Threads | Without fix | With this PR | ------------- | ------------- | ----- | | 1 | 840 | 313| | 2 | 1977 | 316 | | 3 | 3290 | 325 | | 4| 4259 | 344 | | 5 | 5140 | 351 | | 6 | 5872 | 374| | 7 | 6463 | 442| | 8 | 7046 | 499| | 9 | 7648 | 547| | 10 | 8241 | 627| | 11 | 8851 | 749| | 12 | 9595 | 773| Fixes #8393
PerfView indicates that significant time can be spent in this method when we create a Delegate:
The MusicStore application has over 130 managed assemblies loaded.
Eliminating this search can improve steady state transaction time by ~3%
|+ coreclr!COMDelegate::DelegateConstruct
We have to perform two IP lookups in DelegateConstruct
Both of these calls eventually resolve to: ExecutionManager::GetRangeSection
The text was updated successfully, but these errors were encountered: