-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Log jit decisions in PMI and add tool to analyze them #213
Conversation
Here are the current results of capturing this information: win64 all assemblies in Core_Root
win64 most F# tests
Linux x64 Core_Root
Linux x64 most F# tests
Arm64 Core_Root
Arm64 most F# tests
|
/cc @RussKeldorph |
To get the data above I also made some small changes in CoreCLR to get a more fine-grained reason from |
Very nice. Can you log bugs for the evening issues you ran into, and cross-ref them here? Would be helpful to capture inline success reasons too (always, forced, profitable). It is surprising to see that explicit tail call success rate is better on Linux x64 and Linux arm64 than it is on x64 Windows. I think most people's intuition would be that Windows x64 would be the best. Does this reflect actual success or just success in fast tail calling? cc @dotnet/jit-contrib |
I believe @jorive is looking into them already (and was gonna open issues for them)
I'll add this.
For the explicit ones it reflects whether we successfully converted it to a fast tail call. For Windows, failures here means that we will go through the helper, while on Linux it means that we completely give up on doing a tail call. |
As can be seen from the data above the most common reason we neglect to perform an implicit tail-call is because of "local address taken". It turns out that this check is very conservative, rejecting any method with an "address taking" IL instruction (which virtually every method doing something with struct fields contains): A comment above explains that there is a phase ordering issue that means we cannot just rely on "address exposed": However it seems like this comment is outdated. @JosephTremoulet appears to have fixed this phase-ordering issue in #10453. After removing those lines tests still pass and we gain ~10% more implicit tail calls performed when PMI'ing Core_Root on Linux x64: Before
After
Overall, however, this is a size regression:
I'm not sure how much this matters, but the largest reasons seem to be that we do not share epilogues when tail calling, and that we do not use rip-relative jumps, even when we could (of course both these optimizations could not be applied at the same time). For instance: @@ -66529,8 +66529,7 @@ G_M61921_IG08:
mov rsi, r14
mov edx, r15d
mov rcx, r12
- call CSharpx.EnumerableExtensions:ExpectingCountYieldingImpl(ref,int,ref):ref
- nop
+ mov rax, 0xD1FFAB1E
G_M61921_IG09:
lea rsp, [rbp-28H]
@@ -66540,7 +66539,7 @@ G_M61921_IG09:
pop r14
pop r15
pop rbp
- ret
+ rex.jmp rax which could presumably be a rip-relative jmp instead of going through |
@jorive should have fixed the duplicate events in microsoft/perfview#972, so I can work on doing this externally. |
Percentages are added between unix x64 and arm64 [22.84%] Local address taken
[10.62%] Has Struct Promoted Param
[02.22%] Will not fastTailCall hasLargerThanOneStackSlotSizedStruct && calleeStackSize
[01.70%] Will not fastTailCall hasStackArgs && (nCalleeArgs > nCallerArgs)
[01.8%] Return types are not tail call compatible
[01.39%] Caller is marked as no inline
[01.01%] Has Pinned Vars
.[00.79%] Localloc used
|
The importer (impImportCall) does the initial filtering and rejects tail calls for various reasons (some of which come from CEEInfo::canTailCall). Do we want to have those reasons incorporated here? Looks like some of those reasons are not fundamentally preventing tail calls, e.g., https://github.com/dotnet/coreclr/blob/a12705bfc76d6f7d7c9f795acffa92a539662b70/src/jit/importer.cpp#L7298 |
It appears those other reasons already are reported as ETW events, eg. "Caller is marked as no inline" comes from The check guarding whether we report an event in the importer is
|
This adds two flags that allows PMI to output inlining and tail-call decisions to stdout.
2dbdafc
to
589d838
Compare
@dotnet/jit-contrib ptal |
The tool is useful as is. This is a WIP for several reasons:
I think these things can easily be fixed post merge and gives us a tool to use in the meantime. |
Merging as I have tooling that depends on these changes. |
@jashook Can you please submit a PR to add the proper license headers to the .cs files (especially)? |
This adds two flags that allows PMI to output inlining and tail-call
decisions to stdout.
These events are captured and logged to stdout in-process. Eventually, this should use EventPipe or ETW, but I ran into some problems with duplicate events being emitted when using these mechanisms. Until this is resolved this should not be merged.
@jashook @AndyAyersMS