-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RyuJit: Functions with stackalloc won't inline #7109
Comments
@BruceForstall was this fixed by "RyuJIT: improve CQ for amd64 localloc " https://github.com/dotnet/coreclr/issues/6261 (PR "Clean up localloc implementation" dotnet/coreclr#6276) ? Though that might be entirely an non-inlining effecting change? |
Also may have been factor in changing the code path for dotnet/coreclr#6141 from |
There are challenges inlining methods with C/C++compilers have similar limitations inlining methods with We could consider supporting this for aggressive inline methods, perhaps. |
Ahh... so the SP will advance, but there is no way to bring it back? So scoping with SP can only be controlled via function call; rather than say in an So it could get quite messy if a |
Yes, the problematic cases are ones where the call site is in a loop. Even if the The jit could potentially convert small fixed-sized stackallocs into new locals, that might enable inlining the small fixed sized cases. |
Could loop + stackalloc block inline; but without loop its fine? Though I suppose eventually it will hit a loop when all the inlines collapse/fold-in e.g. caller of caller of caller has a loop Trickier than it first appears 😦 |
Maybe for small allocations, yes. The interesting cases for perf likely involve amortizing the cost of small fixed-sized stackallocs in loops, like the example you gave above: davidfowl/Channels#145. If the inlinee does a variable or large stackalloc it presumably will touch most of the storage and so the stackalloc overhead is probably not as significant. I'd like to see the jit convert small fixed stackallocs into regular local vars first. Then we could consider anticipating this in the inliner and allowing such methods to be inlines. |
@AndyAyersMS could you raise an issue for that? I have no idea how to phrase it/get the concept across 😄 |
Sure, dotnet/coreclr#8542. |
Not sure why I didn't auto link to this when I fixed (at least some cases) of this: dotnet/coreclr#14623. Fixed sized stackallocs less than or equal to 32 bytes that are not in loops will no longer block inlining. |
@AndyAyersMS since no initlocals is a thing now (applied to coreclr and some of corefx) and coming as a method |
No init locals helps normal stackalloc roughly equally as well as a stackalloc converted to a fixed buffer. My reluctance to push the value higher (see notes in dotnet/coreclr#14623) was twofold -- not many cases in the 32...512 range, and for larger sizes the jit starts using the larger encoding for most on-frame accesses. There are things the jit could do to address the need for large offsets. It could rearrange the frame and try to put the most frequently accessed locals in places where they can be reached via small offsets, bias the frame pointer so that addressing can take advantage of the full -128...127 range, or introduce a fixed or temporary second frame pointer that is located near the frequently used locals. But the jit does not know these tricks yet. With the recent increase in using span over stackallock we should probably remeasure the frequency stats and maybe that would justify increasing the conversion limit. But we'd need to keep an eye on code size. |
@AndyAyersMS, it would seem like it might be possible to change That is, instead of inlining the stack allocations into the parent method's stack, you would move the stack pointer such that the code generated would be:
I could think of a few places where this might be undesirable, but there are also a number of places where opting into this behavior would be beneficial. |
I'm seeing a jump in what's considered a good cut off for stackalloc due the reduced cost from the use of illink init locals removal: e.g. https://github.com/dotnet/corefx/pull/27204/files Though that method probably isn't a good example of one to inline 😄 |
For non init locals I assume? Else you pull in a bunch of zeroing code as well |
Yeah, that would probably be better 😄 |
Undoing stackalloc has always been something of a pipe dream -- I don't know of any native compilers that do it either. Aside from getting the mechanical aspects right, the undo point creates a barrier to code motion that can be hard for the compiler to model and reason about (similar to what happens with pinning). And there is an EH aspect to consider too... we'd likely need to induce a try/finally to ensure that the undo always happened. |
At which point its probably getting ugly for an inline candidate? |
Yes. When EH gets involved the jit becomes very conservative (as you've seen firsthand with the async cases). |
Looks like private const int CharStackBufferSize = 32; is somewhat popular in corefx formatting, so maybe 64 is now interesting. |
Simple cases of this are now handled (small, fixed sized, not in loops). More general handling is not yet a priority, so will leave this as future. |
Functions using
stackalloc
will fail to inline with reasonunknown opcode
Discovered this as part of this PR davidfowl/Channels#145 (should have probably raised it then)
Example function that will fail to inline even with aggressive inlinining
Changed function that will inline successfully without aggressive inlinining
/cc @AndyAyersMS @JosephTremoulet
category:cq
theme:inlining
skill-level:expert
cost:medium
The text was updated successfully, but these errors were encountered: