-
Notifications
You must be signed in to change notification settings - Fork 2.7k
JIT: convert fixed-sized locallocs to locals, enable inlining #14623
Conversation
@briansull PTAL Jit-diffs stats:
|
It seems like the tests aren't happy with this change. |
Ah, the new test case needs |
src/jit/importer.cpp
Outdated
{ | ||
// Get the size threshold for local conversion | ||
ssize_t maxSize = DEFAULT_MAX_LOCALLOC_TO_LOCAL_SIZE; | ||
INDEBUG(maxSize = JitConfig.JitStackAllocToLocalSize();); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this syntax here.
Could you expand it to #ifdef DEBUG form
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
Hmm, still some test issues to sort out. |
Grr, switches with fallthroughs. Not the first time I've been burned. Hopefully that takes care of the test issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good
Still a few issues out there, building x86 locally now to repro. |
In one of the failing cases (jit\jit64\opt\localloc\call04_small_il) there's a method with a localloc and then a explicit tail call, and the localloc address is passed to the callee. The expectation is that the localloc memory will survive the tail call. My first gloss through Ecma-335 says this is an invalid program, in particular III.2.4 says: The tail. prefix shall immediately precede a call, calli, or callvirt instruction. It indicates that the current method’s stack frame is no longer required and thus can be removed before the call instruction is executed. I am going to look at the other failures first before I decide what to do about this one. |
The other x86 failures are jit asserts about the GS cookie offset. So likely my change is breaking some jit internal consistency issue by just setting the locals as unsafe buffers; probably something else has to change too. Will debug. May hit this on arm32 too. The call04_small case doesn't fail on x64 because for that target the jit bails on explicit tail calls in methods that need GS protection. |
SInce the jit bails on explicit tail calls when localloc is used, seems reasonable to do the same if one is used but then optimized. For the assert cases, it looks like we also need to set |
Do you need to also worry about local alloc inside a loop? They should not be converted this way. |
Inlining is restricted to methods with convertible fixed-sized locallocs. If the call site is in a loop, the local buffer introduced into the inlinee by localloc conversion will get reused each iteration (and zeroed first if the inlinee has initlocals). So stack will not grow. We don't allow inlining of large fixed or variable sized locallocs in this change, in part because of worries about stack growth. I left a bunch of notes on the topic over in the linked issue #8542. |
Still have a few x86 issues to sort out, related to whether or not the converted buffer is considered an unsafe buffer. |
This is what I meant: Expected result: 45
|
Thanks. Updated to avoid conversion in such cases. |
Most recent test failures seem unrelated. Still expecting some x86 failures though. @dotnet-bot retest this please |
Ah, should read the failures more closely.. the new test needed a tweak. |
Now marking the buffer as unsafe; we still get some size wins overall.
|
a9bd025
to
bb84711
Compare
@briansull PTAL one more time if you don't mind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good
@@ -22958,6 +22975,7 @@ void Compiler::fgInsertInlineeBlocks(InlineInfo* pInlineInfo) | |||
compLongUsed |= InlineeCompiler->compLongUsed; | |||
compFloatingPointUsed |= InlineeCompiler->compFloatingPointUsed; | |||
compLocallocUsed |= InlineeCompiler->compLocallocUsed; | |||
compLocallocOptimized |= InlineeCompiler->compLocallocOptimized; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future refactoring opportunity:
We should place all of these values into a single struct so that we can perform a single struct assignment
from InlineCompiler to the this pointer to update all of them at once.
Optimize fixed sized locallocs of 32 bytes or less to use local buffers, if the localloc is not in a loop. Also "optimize" the degenerate 0 byte case. Allow inline candidates containing localloc, but fail inlining if any of a candidate's locallocs do not convert to local buffers. The 32 byte size threshold was arrived at empirically; larger values did not enable many more cases and started seeinge size bloat because of larger stack offsets. We can revise this threshold if we are willing to reorder locals and see fixed sized cases larger than 32 bytes. Closes #8542. Also add missing handler for the callsite is in try region, this was an oversight.
bb84711
to
756cde5
Compare
Fixed formatting issue and rebased. |
@dotnet/jit-contrib formatter build seems to be busted, perhaps from some recent change. Any ideas? diff.cs(22,31): error CS8137: Cannot define a class or member that utilizes tuples because the compiler required type 'System.Runtime.CompilerServices.TupleElementNamesAttribute' cannot be found. Are you missing a reference? [D:\j\workspace\x64_windows_n---616f30f2\jitutils\src\jit-diff\jit-diff.csproj |
Looks like tests\scripts\format.py is loading up a 1.0 vintage CLI to build the jitutils tools for the formatting leg. Seems unlikely to work now that we've updated the tools to netcoreapp2.0. |
Hah, it worked. Will PR those parts separately. |
Hmm, some error trying to clean up after the windows formatter ran and didn't find issues:
Not sure if this is stale bits on the test machine or something else. |
Let me see if it happens again. Link to previous failure. @dotnet-bot retest Windows_NT x64 Innerloop Formatting |
Ah, the file name is > 260 characters. |
54db9ec
to
756cde5
Compare
Reverted the experimental commits now that the formatting is fixed. |
Tizen failure likely unrelated. Seems like a lot of these tests are failing intermittently now. @RussKeldorph eyeballing looks like recently about 1 in 3 test runs fail. Haven't drilled in but would guess these are mostly random failures. Can we up the priority of the retry logic that was discussed in #6298? |
Optimize fixed sized locallocs of 32 bytes or less to use local buffers.
Also "optimize" the degenerate 0 byte case.
Allow inline candidates containing localloc, but fail inlining if any
of a candidate's locallocs do not convert to local buffers.
The 32 byte size threshold was arrived at empirically; larger values did
not enable many more cases and started seeinge size bloat because of
larger stack offsets.
We can revise this threshold if we are willing to reorder locals and see
fixed sized cases larger than 32 bytes.
Closes #8542.
Also add missing handler for the callsite is in try region, this was
an oversight.