Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Log jit decisions in PMI and add tool to analyze them #213

Merged
merged 1 commit into from
Oct 31, 2019

Conversation

jakobbotsch
Copy link
Member

This adds two flags that allows PMI to output inlining and tail-call
decisions to stdout.

These events are captured and logged to stdout in-process. Eventually, this should use EventPipe or ETW, but I ran into some problems with duplicate events being emitted when using these mechanisms. Until this is resolved this should not be merged.

@jashook @AndyAyersMS

@jakobbotsch
Copy link
Member Author

Here are the current results of capturing this information:

win64 all assemblies in Core_Root

2423216 total well-formed events (0 filtered away because they were malformed)
Implicit call sites: 88512/140044 converted
[63.20%] Successfully converted
[23.21%] Local address taken
[04.04%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)
[03.24%] Has Struct Promoted Param
[03.24%] Will not fastTailCall hasMultiByteStackArgs
[01.01%] Caller is marked as no inline
[00.99%] Return types are not tail call compatible
[00.54%] Has Pinned Vars
[00.42%] Localloc used
[00.09%] Need to copy return buffer
[00.01%] Callee is native
[00.01%] Might turn into an intrinsic
[00.01%] Callee might have a StackCrawlMark.LookForMyCaller
[00.00%] GS Security cookie check

Explicit call sites: 0/0 converted
Inlining call sites: 1281304/2283172 converted
[56.12%] Successfully converted
[17.47%] unprofitable inline
[09.79%] noinline per IL/cached result
[09.46%] target not direct
[02.13%] does not return
[01.70%] too many il bytes
[01.09%] within catch region
[00.72%] has exception handling
[00.38%] cannot get method info
[00.32%] has ldstr VM restriction
[00.24%] target not direct managed
[00.10%] too many locals
[00.09%] PInvoke call site with EH
[00.08%] runtime dictionary lookup
[00.08%] too many basic blocks
[00.07%] ldsfld of value class
[00.04%] has switch
[00.03%] delegate invoke
[00.02%] implicit recursive tail call
[00.02%] ldfld needs helper
[00.01%] rarely called, has gc struct
[00.01%] this pointer argument is null
[00.01%] Inlinee requires a security object (or contains StackCrawlMark)
[00.01%] within filter region
[00.01%] complex handle access
[00.00%] generic virtual
[00.00%] recursive
[00.00%] inline exceeds budget
[00.00%] maxstack too big
[00.00%] no return opcode
[00.00%] localloc size unknown
[00.00%] throw with invalid stack
[00.00%] uses stack crawl mark
[00.00%] too many arguments

win64 most F# tests

23821632 total well-formed events (0 filtered away because they were malformed)
Implicit call sites: 599954/896854 converted
[66.90%] Successfully converted
[21.87%] Local address taken
[04.68%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)
[03.12%] Will not fastTailCall hasMultiByteStackArgs
[02.12%] Has Struct Promoted Param
[00.46%] Localloc used
[00.34%] Caller is marked as no inline
[00.30%] Return types are not tail call compatible
[00.11%] Has Pinned Vars
[00.09%] Need to copy return buffer
[00.02%] Callee might have a StackCrawlMark.LookForMyCaller
[00.01%] Might turn into an intrinsic
[00.00%] Callee is native

Explicit call sites: 458423/503024 converted
[91.13%] Successfully converted
[06.55%] Will not fastTailCall hasMultiByteStackArgs
[02.31%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)

Inlining call sites: 14571068/22421754 converted
[64.99%] Successfully converted
[14.35%] unprofitable inline
[07.57%] noinline per IL/cached result
[04.89%] target not direct
[02.24%] explicit tail prefix
[01.49%] too many il bytes
[01.34%] does not return
[00.96%] too many locals
[00.64%] within catch region
[00.40%] has exception handling
[00.22%] runtime dictionary lookup
[00.20%] explicit tail prefix in callee
[00.20%] cannot get method info
[00.11%] has ldstr VM restriction
[00.08%] target not direct managed
[00.06%] ldfld needs helper
[00.05%] ldsfld of value class
[00.05%] has switch
[00.05%] PInvoke call site with EH
[00.05%] too many basic blocks
[00.02%] rarely called, has gc struct
[00.01%] implicit recursive tail call
[00.00%] maxstack too big
[00.00%] inline exceeds budget
[00.00%] complex handle access
[00.00%] delegate invoke
[00.00%] recursive
[00.00%] Inlinee requires a security object (or contains StackCrawlMark)
[00.00%] within filter region
[00.00%] generic virtual
[00.00%] uses stack crawl mark
[00.00%] too many arguments
[00.00%] no return opcode
[00.00%] compilation error
[00.00%] speculative class init failed
[00.00%] throw with invalid stack
[00.00%] this pointer argument is null
[00.00%] Inlinee is marked as no inline

Linux x64 Core_Root

2403427 total well-formed events (0 filtered away because they were malformed)
Implicit call sites: 100860/146398 converted
[68.89%] Successfully converted
[21.06%] Local address taken
[04.00%] Has Struct Promoted Param
[01.93%] Will not fastTailCall hasMultiByteStackArgs
[01.35%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)
[00.93%] Return types are not tail call compatible
[00.79%] Caller is marked as no inline
[00.50%] Has Pinned Vars
[00.39%] Localloc used
[00.06%] Need to copy return buffer
[00.05%] Will not fastTailCall calleeStackSize > 0 && hasTwoSlotSizedStruct
[00.02%] Callee is native
[00.01%] Callee might have a StackCrawlMark.LookForMyCaller
[00.01%] Might turn into an intrinsic
[00.00%] GS Security cookie check

Explicit call sites: 0/0 converted
Inlining call sites: 1278054/2257029 converted
[56.63%] Successfully converted
[16.95%] unprofitable inline
[09.55%] noinline per IL/cached result
[09.46%] target not direct
[02.29%] does not return
[01.72%] too many il bytes
[00.99%] within catch region
[00.68%] has exception handling
[00.61%] target not direct managed
[00.33%] cannot get method info
[00.29%] has ldstr VM restriction
[00.10%] too many locals
[00.08%] runtime dictionary lookup
[00.08%] too many basic blocks
[00.07%] ldsfld of value class
[00.04%] has switch
[00.03%] delegate invoke
[00.02%] implicit recursive tail call
[00.02%] ldfld needs helper
[00.02%] PInvoke call site with EH
[00.02%] rarely called, has gc struct
[00.01%] Inlinee requires a security object (or contains StackCrawlMark)
[00.01%] within filter region
[00.01%] complex handle access
[00.00%] inline exceeds budget
[00.00%] generic virtual
[00.00%] recursive
[00.00%] this pointer argument is null
[00.00%] no return opcode
[00.00%] localloc size unknown
[00.00%] maxstack too big
[00.00%] throw with invalid stack
[00.00%] uses stack crawl mark
[00.00%] too many arguments

Linux x64 most F# tests

27466214 total well-formed events (99 filtered away because they were malformed)
Implicit call sites: 895824/1228222 converted
[72.94%] Successfully converted
[20.32%] Local address taken
[02.49%] Has Struct Promoted Param
[01.36%] Will not fastTailCall hasMultiByteStackArgs
[01.00%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)
[00.57%] Localloc used
[00.45%] Will not fastTailCall calleeStackSize > 0 && hasTwoSlotSizedStruct
[00.41%] Return types are not tail call compatible
[00.29%] Caller is marked as no inline
[00.10%] Has Pinned Vars
[00.05%] Need to copy return buffer
[00.01%] Callee might have a StackCrawlMark.LookForMyCaller
[00.01%] Might turn into an intrinsic
[00.00%] Callee is native

Explicit call sites: 398958/422664 converted
[94.39%] Successfully converted
[03.10%] Will not fastTailCall hasMultiByteStackArgs
[01.83%] Will not fastTailCall calleeStackSize > 0 && hasTwoSlotSizedStruct
[00.67%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)

Inlining call sites: 16289961/25815328 converted
[63.10%] Successfully converted
[14.96%] unprofitable inline
[07.60%] noinline per IL/cached result
[05.60%] target not direct
[01.80%] too many il bytes
[01.74%] does not return
[01.64%] explicit tail prefix
[00.78%] within catch region
[00.71%] too many locals
[00.60%] target not direct managed
[00.51%] has exception handling
[00.24%] cannot get method info
[00.19%] runtime dictionary lookup
[00.14%] explicit tail prefix in callee
[00.12%] has ldstr VM restriction
[00.07%] ldsfld of value class
[00.06%] too many basic blocks
[00.05%] has switch
[00.04%] ldfld needs helper
[00.02%] rarely called, has gc struct
[00.02%] implicit recursive tail call
[00.00%] inline exceeds budget
[00.00%] complex handle access
[00.00%] delegate invoke
[00.00%] PInvoke call site with EH
[00.00%] maxstack too big
[00.00%] recursive
[00.00%] within filter region
[00.00%] Inlinee requires a security object (or contains StackCrawlMark)
[00.00%] generic virtual
[00.00%] no return opcode
[00.00%] compilation error
[00.00%] speculative class init failed
[00.00%] uses stack crawl mark
[00.00%] too many arguments
[00.00%] throw with invalid stack
[00.00%] this pointer argument is null
[00.00%] Inlinee is marked as no inline

Arm64 Core_Root

2216261 total well-formed events (0 filtered away because they were malformed)
Implicit call sites: 95588/136669 converted
[69.94%] Successfully converted
[22.27%] Local address taken
[04.06%] Has Struct Promoted Param
[00.95%] Will not fastTailCall hasMultiByteStackArgs
[00.85%] Return types are not tail call compatible
[00.73%] Caller is marked as no inline
[00.47%] Has Pinned Vars
[00.40%] Localloc used
[00.23%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)
[00.03%] Need to copy return buffer
[00.03%] Will not fastTailCall calleeStackSize > callerStackSize
[00.02%] Callee is native
[00.01%] Will not fastTailCall calleeStackSize > 0 && hasTwoSlotSizedStruct
[00.01%] Callee might have a StackCrawlMark.LookForMyCaller
[00.01%] Might turn into an intrinsic

Explicit call sites: 0/0 converted
Inlining call sites: 1174659/2079592 converted
[56.49%] Successfully converted
[16.75%] unprofitable inline
[09.56%] noinline per IL/cached result
[09.49%] target not direct
[02.39%] does not return
[01.79%] too many il bytes
[01.07%] within catch region
[00.71%] has exception handling
[00.64%] target not direct managed
[00.31%] cannot get method info
[00.29%] has ldstr VM restriction
[00.09%] too many locals
[00.09%] runtime dictionary lookup
[00.08%] too many basic blocks
[00.07%] ldsfld of value class
[00.04%] has switch
[00.04%] delegate invoke
[00.02%] ldfld needs helper
[00.02%] PInvoke call site with EH
[00.02%] implicit recursive tail call
[00.02%] rarely called, has gc struct
[00.01%] within filter region
[00.01%] complex handle access
[00.00%] inline exceeds budget
[00.00%] generic virtual
[00.00%] Inlinee requires a security object (or contains StackCrawlMark)
[00.00%] recursive
[00.00%] this pointer argument is null
[00.00%] no return opcode
[00.00%] maxstack too big
[00.00%] uses stack crawl mark
[00.00%] throw with invalid stack
[00.00%] localloc size unknown
[00.00%] too many arguments

Arm64 most F# tests

31253825 total well-formed events (0 filtered away because they were malformed)
Implicit call sites: 1031244/1351884 converted
[76.28%] Successfully converted
[19.19%] Local address taken
[02.26%] Has Struct Promoted Param
[00.48%] Will not fastTailCall hasMultiByteStackArgs
[00.47%] Localloc used
[00.34%] Return types are not tail call compatible
[00.33%] Will not fastTailCall calleeStackSize > 0 && hasTwoSlotSizedStruct
[00.28%] Caller is marked as no inline
[00.27%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)
[00.08%] Has Pinned Vars
[00.01%] Callee might have a StackCrawlMark.LookForMyCaller
[00.01%] Need to copy return buffer
[00.00%] Might turn into an intrinsic
[00.00%] Callee is native

Explicit call sites: 595458/600974 converted
[99.08%] Successfully converted
[00.72%] Will not fastTailCall calleeStackSize > 0 && hasTwoSlotSizedStruct
[00.11%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)
[00.08%] Will not fastTailCall hasMultiByteStackArgs
[00.01%] Will not fastTailCall calleeStackSize > 0 && hasHfaArg

Inlining call sites: 18823551/29300967 converted
[64.24%] Successfully converted
[14.42%] unprofitable inline
[07.51%] noinline per IL/cached result
[05.16%] target not direct
[02.06%] explicit tail prefix
[01.60%] too many il bytes
[01.47%] does not return
[00.86%] too many locals
[00.67%] within catch region
[00.64%] target not direct managed
[00.41%] has exception handling
[00.21%] runtime dictionary lookup
[00.18%] cannot get method info
[00.18%] explicit tail prefix in callee
[00.11%] has ldstr VM restriction
[00.06%] ldsfld of value class
[00.06%] ldfld needs helper
[00.05%] has switch
[00.05%] too many basic blocks
[00.02%] rarely called, has gc struct
[00.01%] implicit recursive tail call
[00.00%] maxstack too big
[00.00%] complex handle access
[00.00%] delegate invoke
[00.00%] recursive
[00.00%] PInvoke call site with EH
[00.00%] Inlinee requires a security object (or contains StackCrawlMark)
[00.00%] within filter region
[00.00%] generic virtual
[00.00%] compilation error
[00.00%] too many arguments
[00.00%] uses stack crawl mark
[00.00%] speculative class init failed
[00.00%] throw with invalid stack
[00.00%] this pointer argument is null
[00.00%] no return opcode
[00.00%] inline exceeds budget
[00.00%] Inlinee is marked as no inline

@jashook
Copy link

jashook commented Jul 18, 2019

/cc @RussKeldorph

@jakobbotsch
Copy link
Member Author

To get the data above I also made some small changes in CoreCLR to get a more fine-grained reason from fgCanFastTailCall. In current CoreCLR all the reasons prefixed with "Will not fastTailCall" will show up under one bucket as "Opportunistic tail call cannot be dispatched as epilog+jmp".

@AndyAyersMS
Copy link
Member

Very nice. Can you log bugs for the evening issues you ran into, and cross-ref them here?

Would be helpful to capture inline success reasons too (always, forced, profitable).

It is surprising to see that explicit tail call success rate is better on Linux x64 and Linux arm64 than it is on x64 Windows. I think most people's intuition would be that Windows x64 would be the best.

Does this reflect actual success or just success in fast tail calling?

cc @dotnet/jit-contrib

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jul 18, 2019

Can you log bugs for the evening issues you ran into, and cross-ref them here?

I believe @jorive is looking into them already (and was gonna open issues for them)

Would be helpful to capture inline success reasons too (always, forced, profitable).

I'll add this.

It is surprising to see that explicit tail call success rate is better on Linux x64 and Linux arm64 than it is on x64 Windows. I think most people's intuition would be that Windows x64 would be the best.

Does this reflect actual success or just success in fast tail calling?

For the explicit ones it reflects whether we successfully converted it to a fast tail call. For Windows, failures here means that we will go through the helper, while on Linux it means that we completely give up on doing a tail call.
@jashook had a good point that hasMultiByteStackArgs means something different on each platform. On Windows this is any struct arg of size > 8, since this has to be passed by ref. However on Linux many of these structs are passed in registers and are successfully tail-called. This might explain the difference.

@jakobbotsch
Copy link
Member Author

As can be seen from the data above the most common reason we neglect to perform an implicit tail-call is because of "local address taken". It turns out that this check is very conservative, rejecting any method with an "address taking" IL instruction (which virtually every method doing something with struct fields contains):
https://github.com/dotnet/coreclr/blob/5f93d3b1c48ba6916d5f31d79cb7c17d564eecef/src/jit/morph.cpp#L8134-L8138

A comment above explains that there is a phase ordering issue that means we cannot just rely on "address exposed":
https://github.com/dotnet/coreclr/blob/5f93d3b1c48ba6916d5f31d79cb7c17d564eecef/src/jit/morph.cpp#L8106-L8112

However it seems like this comment is outdated. @JosephTremoulet appears to have fixed this phase-ordering issue in #10453. After removing those lines tests still pass and we gain ~10% more implicit tail calls performed when PMI'ing Core_Root on Linux x64:

Before

Implicit call sites: 101041/146651 converted
[68.90%] Successfully converted
[21.04%] Local address taken
[04.03%] Has Struct Promoted Param
[02.00%] Will not fastTailCall hasLargerThanOneStackSlotSizedStruct && calleeStackSize
[01.32%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)
[00.94%] Return types are not tail call compatible
[00.80%] Caller is marked as no inline
[00.50%] Has Pinned Vars
[00.39%] Localloc used
[00.04%] Need to copy return buffer
[00.02%] Callee is native
[00.01%] Callee might have a StackCrawlMark.LookForMyCaller
[00.01%] Might turn into an intrinsic
[00.00%] GS Security cookie check

After

2409682 total well-formed events (0 filtered away because they were malformed)
Implicit call sites: 113638/146760 converted
[77.43%] Successfully converted
[10.94%] Local address taken
[05.24%] Has Struct Promoted Param
[02.20%] Will not fastTailCall hasLargerThanOneStackSlotSizedStruct && calleeStackSize
[01.50%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)
[00.93%] Return types are not tail call compatible
[00.79%] Caller is marked as no inline
[00.51%] Has Pinned Vars
[00.39%] Localloc used
[00.04%] Need to copy return buffer
[00.02%] Callee is native
[00.01%] Might turn into an intrinsic
[00.01%] Callee might have a StackCrawlMark.LookForMyCaller
[00.00%] GS Security cookie check

Overall, however, this is a size regression:

Found 83 files with textual diffs.

Summary:
(Lower is better)

Total bytes of diff: 45443 (0.103% of base)
    diff is a regression.

Top file regressions by size (bytes):
       13284 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.405% of base)
        6052 : System.Private.CoreLib.dasm (0.112% of base)
        5574 : Microsoft.CodeAnalysis.CSharp.dasm (0.119% of base)
        2971 : System.Threading.Tasks.Dataflow.dasm (0.313% of base)
        2395 : System.Linq.Parallel.dasm (0.138% of base)

Top file improvements by size (bytes):
         -53 : System.IO.IsolatedStorage.dasm (-0.269% of base)
         -23 : System.IO.FileSystem.dasm (-0.018% of base)

82 total files with size differences (2 improved, 80 regressed), 47 unchanged.

Top method regressions by size (bytes):
         654 (11.510% of base) : System.Text.RegularExpressions.dasm - System.Text.RegularExpressions.RegexFCD:CalculateFC(int,ref,int):this (3 methods)
         461 (22.335% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.BoundTreeVisitor`2[__Canon,Int64][System.__Canon,System.Int64]:Visit(ref,ref):long:this
         461 (22.335% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.BoundTreeVisitor`2[Int32,Int64][System.Int32,System.Int64]:Visit(ref,int):long:this
         461 (22.335% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.BoundTreeVisitor`2[Int64,Int64][System.Int64,System.Int64]:Visit(ref,long):long:this
         458 (21.655% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.BoundTreeVisitor`2[Byte,Int64][System.Byte,System.Int64]:Visit(ref,ubyte):long:this

Top method improvements by size (bytes):
         -66 (-44.595% of base) : System.Net.Http.dasm - System.Net.Http.HttpConnection:ReadBufferedAsync(struct):struct:this
         -59 (-23.413% of base) : System.Net.Http.dasm - System.Net.Http.HttpConnection:WriteWithoutBufferingAsync(struct):struct:this
         -53 (-33.333% of base) : System.IO.IsolatedStorage.dasm - System.IO.IsolatedStorage.IsolatedStorageFileStream:DisposeAsync():struct:this
         -50 (-0.257% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:TrimEnd(struct,struct):struct (29 methods)
         -44 (-0.223% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:TrimStart(struct,struct):struct (29 methods)

Top method regressions by size (percentage):
         148 (61.667% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata3TemplateHATraceData:PayloadValue(int):ref:this
         136 (58.369% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata4TemplateHATraceData:PayloadValue(int):ref:this
         136 (58.369% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata13TemplateHATraceData:PayloadValue(int):ref:this
         124 (54.867% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata10TemplateHATraceData:PayloadValue(int):ref:this
         124 (54.867% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata7TemplateHATraceData:PayloadValue(int):ref:this

Top method improvements by size (percentage):
         -66 (-44.595% of base) : System.Net.Http.dasm - System.Net.Http.HttpConnection:ReadBufferedAsync(struct):struct:this
         -53 (-33.333% of base) : System.IO.IsolatedStorage.dasm - System.IO.IsolatedStorage.IsolatedStorageFileStream:DisposeAsync():struct:this
         -38 (-27.737% of base) : System.Private.CoreLib.dasm - System.Threading.CancellationTokenRegistration:DisposeAsync():struct:this
         -43 (-27.215% of base) : System.IO.Compression.dasm - System.IO.Compression.GZipStream:DisposeAsync():struct:this
         -59 (-23.413% of base) : System.Net.Http.dasm - System.Net.Http.HttpConnection:WriteWithoutBufferingAsync(struct):struct:this

4151 total methods with size differences (307 improved, 3844 regressed), 243718 unchanged.

I'm not sure how much this matters, but the largest reasons seem to be that we do not share epilogues when tail calling, and that we do not use rip-relative jumps, even when we could (of course both these optimizations could not be applied at the same time). For instance:

@@ -66529,8 +66529,7 @@ G_M61921_IG08:
        mov      rsi, r14
        mov      edx, r15d
        mov      rcx, r12
-       call     CSharpx.EnumerableExtensions:ExpectingCountYieldingImpl(ref,int,ref):ref
-       nop      
+       mov      rax, 0xD1FFAB1E
 
 G_M61921_IG09:
        lea      rsp, [rbp-28H]
@@ -66540,7 +66539,7 @@ G_M61921_IG09:
        pop      r14
        pop      r15
        pop      rbp
-       ret      
+       rex.jmp  rax

which could presumably be a rip-relative jmp instead of going through rax.

@jakobbotsch
Copy link
Member Author

@jorive should have fixed the duplicate events in microsoft/perfview#972, so I can work on doing this externally.

@jashook
Copy link

jashook commented Jul 23, 2019

Percentages are added between unix x64 and arm64

[22.84%] Local address taken

Difficult/not possible to improve

[10.62%] Has Struct Promoted Param

Requires changes to re-construct the outgoing struct parameter before the tail call. Should be able to reuse existing code.

[02.22%] Will not fastTailCall hasLargerThanOneStackSlotSizedStruct && calleeStackSize

Requires work in LowerTailCall, to fix the assumption that each argument has only one slot.

[01.70%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)

Requires inflating the stack to artificially pad the caller's outgoing argspace. May have more work to fix edge cases

[01.8%] Return types are not tail call compatible

Needs investigation

[01.39%] Caller is marked as no inline

Cannot be changed, we use this as a suggestion to always have the method in the stack trace

[01.01%] Has Pinned Vars

Needs investigation

.[00.79%] Localloc used

Needs investigation

@erozenfeld
Copy link
Member

The importer (impImportCall) does the initial filtering and rejects tail calls for various reasons (some of which come from CEEInfo::canTailCall). Do we want to have those reasons incorporated here? Looks like some of those reasons are not fundamentally preventing tail calls, e.g., https://github.com/dotnet/coreclr/blob/a12705bfc76d6f7d7c9f795acffa92a539662b70/src/jit/importer.cpp#L7298

@jakobbotsch
Copy link
Member Author

Do we want to have those reasons incorporated here?

It appears those other reasons already are reported as ETW events, eg. "Caller is marked as no inline" comes from CEEInfo::canTailCall and "Return types are not tail call compatible" comes from impImportCall.

The check guarding whether we report an event in the importer is prefixFlags & PREFIX_TAILCALL and this flag is set either if

  1. it has an explicit tail prefix, or
  2. impIsImplicitTailCallCandidate returns true, which is the check that the call is in tail position.

This adds two flags that allows PMI to output inlining and tail-call
decisions to stdout.
@jashook
Copy link

jashook commented Oct 31, 2019

@dotnet/jit-contrib ptal

@jashook
Copy link

jashook commented Oct 31, 2019

The tool is useful as is. This is a WIP for several reasons:

  1. Missing documentation
  2. Better logging/parsing and integration into pmi

I think these things can easily be fixed post merge and gives us a tool to use in the meantime.

@jashook
Copy link

jashook commented Oct 31, 2019

Merging as I have tooling that depends on these changes.

@jashook jashook merged commit 1c797eb into dotnet:master Oct 31, 2019
@BruceForstall
Copy link
Member

@jashook Can you please submit a PR to add the proper license headers to the .cs files (especially)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants