Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecutionManager::GetRangeSection performs a linear search over list of loaded assemblies #8393

Closed
briansull opened this issue Jun 23, 2017 · 7 comments · Fixed by #79021
Closed
Assignees
Labels
area-VM-coreclr tenet-performance Performance related issue
Milestone

Comments

@briansull
Copy link
Contributor

PerfView indicates that significant time can be spent in this method when we create a Delegate:
The MusicStore application has over 130 managed assemblies loaded.
Eliminating this search can improve steady state transaction time by ~3%

|+ coreclr!COMDelegate::DelegateConstruct
We have to perform two IP lookups in DelegateConstruct

       FCIMPL3(void, COMDelegate::DelegateConstruct, Object* refThisUNSAFE, Object* targetUNSAFE, PCODE method)
(line 1914)
219.0 |    MethodDesc * pCreatorMethod = ExecutionManager::GetCodeMethodDesc((PCODE)pRetAddr);
(line 1933)
159.0 |    MethodDesc *pMethOrig = Entry2MethodDesc(method, pRealMT);

Both of these calls eventually resolve to: ExecutionManager::GetRangeSection

RangeSection* ExecutionManager::GetRangeSection(TADDR addr)
. . . 
          while (pCurr != NULL)
           {
               // See if addr is in [pCurr->LowAddress .. pCurr->HighAddress)
 12.0 |        if (pCurr->LowAddress <= addr)
               {
                   // Since we are sorted, once pCurr->HighAddress is less than addr
                   // then all subsequence ones will also be lower, so we are done.
  3.0 |            if (addr >= pCurr->HighAddress)
                   {
                       // we'll return NULL and put pLast into pLastUsed
                       pCurr = NULL;
                   }
                   else
                   {
                       // addr must be in [pCurr->LowAddress .. pCurr->HighAddress)
                       _ASSERTE((pCurr->LowAddress <= addr) && (addr < pCurr->HighAddress));
       
                       // Found the matching RangeSection
                       // we'll return pCurr and put it into pLastUsed
                       pLast = pCurr;
                   }
       
                   break;
               }
320.0 |        pLast = pCurr;
 57.0 |        pCurr = pCurr->pnext;
           }
@briansull
Copy link
Contributor Author

@vancem Issue filed

@briansull briansull changed the title ExecutionManager::GetRangeSection performs a linear search over list of loaded assemblie ExecutionManager::GetRangeSection performs a linear search over list of loaded assemblies Jun 23, 2017
@briansull briansull self-assigned this Jul 13, 2017
@briansull
Copy link
Contributor Author

@jkotas FYI, these delegate lookups cost us 3% of execution time in MusicStore (when everything is crossgened) because we have loaded over 140 managed assemblies (native imgaes) and the linear search becomes a problem.

@briansull briansull removed their assignment Nov 15, 2017
@AndyAyersMS
Copy link
Member

It looks like in CoreCLR the first range lookup is actually no longer needed. In desktop this lookup feeds secure delegate creation. That use has been removed in Core, but the lookup remains.

So maybe we can get half of this perf back very cheaply.

    void* pRetAddr = _ReturnAddress();
    pCreatorMethod = ExecutionManager::GetCodeMethodDesc((PCODE)pRetAddr);
    // pCreatorMethod unused after this

AndyAyersMS referenced this issue in AndyAyersMS/coreclr Feb 12, 2018
Remove a range lookup that's no longer needed.

See related issue #12438.
jkotas referenced this issue in dotnet/coreclr Feb 13, 2018
Remove a range lookup that's no longer needed.

See related issue #12438.
@AndyAyersMS
Copy link
Member

Might be more we could do here but the Range data structure is fundamental so it would require careful thought. Moving to future.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@ghost
Copy link

ghost commented Oct 27, 2022

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

@ghost ghost added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Oct 27, 2022
@jkotas
Copy link
Member

jkotas commented Oct 27, 2022

This should stay open.

@ghost ghost removed no-recent-activity backlog-cleanup-candidate An inactive issue that has been marked for automated closure. labels Oct 27, 2022
@davidwrighton davidwrighton self-assigned this Nov 9, 2022
@davidwrighton
Copy link
Member

I'm looking at a fix for this.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 10, 2022
davidwrighton added a commit that referenced this issue Mar 4, 2023
…to a MethodDesc (#79021)

Mapping from an instruction pointer to a MethodDesc is a critical component of the runtime used in many places, notably diagnostics, slow path delegate creation, and stackwalking.

This change provides a dramatic improvement in the performance of that logic through several techniques.

1. The mapping from IP to range of interesting memory is done via a tree structure resembling that of a page table. The result of this is that in addition to reducing the locking logic, the cost of looking up in the presence of many different loaded assemblies will be reduced to a nearly constant time.
2. That tree structure is configured so that when accessing memory which will never be freed in the lifetime of the process, the memory can be accessed without taking any locks whatsoever.
3. The code map is enhanced to include not only the code generated by the JIT/R2R code, but also to include the fixup and stub precode stubs.
4. In addition, performance improvement was done to improve the performance of slow path delegate creation in particular, by reordering which checks are done, and by writing a simplified signature parse routine for computing the number of arguments to a function.

Performance of this was tested in both the EH stackwalking scenario as well as the delegate slow path creation scenario, but the delegate logic is what will be most visible when this is checked in. (See PR #79471 for details of the additional changes necessary to take advantage of this work when doing EH) (#8393 describes the potential product impact of just improving the delegate slow path creation code)

For the delegate creation scenario the perf impact was measured to be quite substantial. These numbers are in ms to create a constant number of delegates on each thread used. Smaller is better. The test was run on a 6 core machine.

| Threads | Without fix | With this PR
| ------------- | ------------- | ----- |
| 1  | 840 |  313|
| 2 | 1977 | 316 |
| 3 | 3290 | 325 |
| 4| 4259 | 344 |
| 5 | 5140 | 351 |
| 6 | 5872 | 374|
| 7 | 6463 | 442|
| 8 | 7046 | 499|
| 9 | 7648 | 547|
| 10 | 8241 | 627|
| 11 | 8851 | 749|
| 12 | 9595 | 773|

Fixes #8393
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 4, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr tenet-performance Performance related issue
Projects
None yet
5 participants