RyuJit: Functions with stackalloc won't inline #7109

benaadams · 2016-12-08T20:47:15Z

Functions using stackalloc will fail to inline with reason unknown opcode

Discovered this as part of this PR davidfowl/Channels#145 (should have probably raised it then)

Example function that will fail to inline even with aggressive inlinining

public unsafe bool TrySliceTo(byte b1, byte b2, out ReadableBuffer slice, out ReadCursor cursor)
{
   byte* twoBytes = stackalloc byte[2];
   twoBytes[0] = b1;
   twoBytes[1] = b2;
   return TrySliceTo(new Span<byte>(twoBytes, 2), out slice, out cursor);
}

Changed function that will inline successfully without aggressive inlinining

public unsafe bool TrySliceTo(byte b1, byte b2, out ReadableBuffer slice, out ReadCursor cursor)
{
   // use address of ushort rather than stackalloc as the inliner won't inline functions with stackalloc
   ushort twoBytes;
   byte* byteArray = (byte*)&twoBytes;
   byteArray[0] = b1;
   byteArray[1] = b2;
   return TrySliceTo(new Span<byte>(byteArray, 2), out slice, out cursor);
}

/cc @AndyAyersMS @JosephTremoulet
category:cq
theme:inlining
skill-level:expert
cost:medium

The text was updated successfully, but these errors were encountered:

benaadams · 2016-12-08T20:54:48Z

@BruceForstall was this fixed by "RyuJIT: improve CQ for amd64 localloc " https://github.com/dotnet/coreclr/issues/6261 (PR "Clean up localloc implementation" dotnet/coreclr#6276) ?

Though that might be entirely an non-inlining effecting change?

benaadams · 2016-12-08T21:25:14Z

Also may have been factor in changing the code path for dotnet/coreclr#6141 from stackalloc to uint for similar reasons as discovered in benchmarking by @jamesqo (additionally may have been additional localloc issues referenced above)

AndyAyersMS · 2016-12-08T21:42:40Z

There are challenges inlining methods with stackalloc, primarily because there is no unstackalloc. Inlining such methods can convert cases that do not overflow the stack into cases that do.

C/C++compilers have similar limitations inlining methods with alloca.

We could consider supporting this for aggressive inline methods, perhaps.

benaadams · 2016-12-08T22:09:53Z

What about for fixed sized/const stackallocs vs variable sized ones? Maybe with a threashold?

Are quite a lot of fixed sized allocs (also variable ones) coreclr; corefx

benaadams · 2016-12-08T22:18:07Z

primarily because there is no unstackalloc.

Ahh... so the SP will advance, but there is no way to bring it back? So scoping with SP can only be controlled via function call; rather than say in an if/while block etc

So it could get quite messy if a stackalloc function was inlined into the top of a while loop for example?

AndyAyersMS · 2016-12-08T22:21:21Z

Yes, the problematic cases are ones where the call site is in a loop. Even if the stackalloc is fixed size we won't know the number of loop iterations during inlining and so would not be able to bound the potential stack growth.

The jit could potentially convert small fixed-sized stackallocs into new locals, that might enable inlining the small fixed sized cases.

benaadams · 2016-12-08T22:37:32Z

Could loop + stackalloc block inline; but without loop its fine?

Though I suppose eventually it will hit a loop when all the inlines collapse/fold-in

e.g. caller of caller of caller has a loop

Trickier than it first appears 😦

AndyAyersMS · 2016-12-08T23:00:27Z

Maybe for small allocations, yes.

The interesting cases for perf likely involve amortizing the cost of small fixed-sized stackallocs in loops, like the example you gave above: davidfowl/Channels#145. If the inlinee does a variable or large stackalloc it presumably will touch most of the storage and so the stackalloc overhead is probably not as significant.

I'd like to see the jit convert small fixed stackallocs into regular local vars first. Then we could consider anticipating this in the inliner and allowing such methods to be inlines.

benaadams · 2016-12-08T23:04:57Z

@AndyAyersMS could you raise an issue for that? I have no idea how to phrase it/get the concept across 😄

AndyAyersMS · 2016-12-08T23:14:57Z

Sure, dotnet/coreclr#8542.

AndyAyersMS · 2017-12-09T00:29:52Z

Not sure why I didn't auto link to this when I fixed (at least some cases) of this: dotnet/coreclr#14623.

Fixed sized stackallocs less than or equal to 32 bytes that are not in loops will no longer block inlining.

benaadams · 2018-02-22T16:00:14Z

@AndyAyersMS since no initlocals is a thing now (applied to coreclr and some of corefx) and coming as a method SkipLocalsInitAttribute in C#7.3 dotnet/roslyn#24723; is it worth considering expanding the fixed sized stackalloc threshold when no zeroing will occur? (e.g. no .locals init)

AndyAyersMS · 2018-02-22T16:23:38Z

No init locals helps normal stackalloc roughly equally as well as a stackalloc converted to a fixed buffer.

My reluctance to push the value higher (see notes in dotnet/coreclr#14623) was twofold -- not many cases in the 32...512 range, and for larger sizes the jit starts using the larger encoding for most on-frame accesses.

There are things the jit could do to address the need for large offsets. It could rearrange the frame and try to put the most frequently accessed locals in places where they can be reached via small offsets, bias the frame pointer so that addressing can take advantage of the full -128...127 range, or introduce a fixed or temporary second frame pointer that is located near the frequently used locals. But the jit does not know these tricks yet.

With the recent increase in using span over stackallock we should probably remeasure the frequency stats and maybe that would justify increasing the conversion limit. But we'd need to keep an eye on code size.

tannergooding · 2018-02-22T17:47:17Z

@AndyAyersMS, it would seem like it might be possible to change stackalloc for inlined methods to also be able to unalloc

That is, instead of inlining the stack allocations into the parent method's stack, you would move the stack pointer such that the code generated would be:

stackalloc
inlined code
unalloc

I could think of a few places where this might be undesirable, but there are also a number of places where opting into this behavior would be beneficial.

benaadams · 2018-02-22T18:06:13Z

With the recent increase in using span over stackallock we should probably remeasure the frequency stats and maybe that would justify increasing the conversion limit.

I'm seeing a jump in what's considered a good cut off for stackalloc due the reduced cost from the use of illink init locals removal: e.g. https://github.com/dotnet/corefx/pull/27204/files

Though that method probably isn't a good example of one to inline 😄

benaadams · 2018-02-22T18:31:31Z

you would move the stack pointer such that the code generated would be:

For non init locals I assume? Else you pull in a bunch of zeroing code as well

tannergooding · 2018-02-22T18:58:06Z

For non init locals I assume? Else you pull in a bunch of zeroing code as well

Yeah, that would probably be better 😄

AndyAyersMS · 2018-02-22T19:35:02Z

Undoing stackalloc has always been something of a pipe dream -- I don't know of any native compilers that do it either.

Aside from getting the mechanical aspects right, the undo point creates a barrier to code motion that can be hard for the compiler to model and reason about (similar to what happens with pinning). And there is an EH aspect to consider too... we'd likely need to induce a try/finally to ensure that the undo always happened.

benaadams · 2018-02-22T19:41:06Z

And there is an EH aspect to consider too... we'd likely need to induce a try/finally to ensure that the undo always happened.

At which point its probably getting ugly for an inline candidate?

AndyAyersMS · 2018-02-22T20:06:48Z

Yes. When EH gets involved the jit becomes very conservative (as you've seen firsthand with the async cases).

AndyAyersMS · 2018-02-24T00:01:25Z

Looks like

private const int CharStackBufferSize = 32;

is somewhat popular in corefx formatting, so maybe 64 is now interesting.

AndyAyersMS · 2020-11-24T20:33:45Z

Simple cases of this are now handled (small, fixed sized, not in loops).

More general handling is not yet a priority, so will leave this as future.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

AndyAyersMS mentioned this issue Jan 31, 2020

JIT: consider optimizing small fixed-sized stackalloc #7113

Closed

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

AndyAyersMS removed the JitUntriaged CLR JIT issues needing additional triage label Nov 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RyuJit: Functions with stackalloc won't inline #7109

RyuJit: Functions with stackalloc won't inline #7109

benaadams commented Dec 8, 2016

benaadams commented Dec 8, 2016 •

edited

Loading

benaadams commented Dec 8, 2016 •

edited

Loading

AndyAyersMS commented Dec 8, 2016

benaadams commented Dec 8, 2016 •

edited

Loading

benaadams commented Dec 8, 2016

AndyAyersMS commented Dec 8, 2016

benaadams commented Dec 8, 2016 •

edited

Loading

AndyAyersMS commented Dec 8, 2016

benaadams commented Dec 8, 2016 •

edited

Loading

AndyAyersMS commented Dec 8, 2016

AndyAyersMS commented Dec 9, 2017

benaadams commented Feb 22, 2018

AndyAyersMS commented Feb 22, 2018

tannergooding commented Feb 22, 2018

benaadams commented Feb 22, 2018

benaadams commented Feb 22, 2018

tannergooding commented Feb 22, 2018

AndyAyersMS commented Feb 22, 2018

benaadams commented Feb 22, 2018

AndyAyersMS commented Feb 22, 2018

AndyAyersMS commented Feb 24, 2018

AndyAyersMS commented Nov 24, 2020

RyuJit: Functions with stackalloc won't inline #7109

RyuJit: Functions with stackalloc won't inline #7109

Comments

benaadams commented Dec 8, 2016

benaadams commented Dec 8, 2016 • edited Loading

benaadams commented Dec 8, 2016 • edited Loading

AndyAyersMS commented Dec 8, 2016

benaadams commented Dec 8, 2016 • edited Loading

benaadams commented Dec 8, 2016

AndyAyersMS commented Dec 8, 2016

benaadams commented Dec 8, 2016 • edited Loading

AndyAyersMS commented Dec 8, 2016

benaadams commented Dec 8, 2016 • edited Loading

AndyAyersMS commented Dec 8, 2016

AndyAyersMS commented Dec 9, 2017

benaadams commented Feb 22, 2018

AndyAyersMS commented Feb 22, 2018

tannergooding commented Feb 22, 2018

benaadams commented Feb 22, 2018

benaadams commented Feb 22, 2018

tannergooding commented Feb 22, 2018

AndyAyersMS commented Feb 22, 2018

benaadams commented Feb 22, 2018

AndyAyersMS commented Feb 22, 2018

AndyAyersMS commented Feb 24, 2018

AndyAyersMS commented Nov 24, 2020

benaadams commented Dec 8, 2016 •

edited

Loading

benaadams commented Dec 8, 2016 •

edited

Loading

benaadams commented Dec 8, 2016 •

edited

Loading

benaadams commented Dec 8, 2016 •

edited

Loading

benaadams commented Dec 8, 2016 •

edited

Loading