CompileDeepTree_NoStackOverflowFast fails StackOverflowException with debug build #21374

gkhanna79 · 2017-04-26T05:23:17Z

@seanshpark commented on Thu Apr 20 2017

CoreFX System.Linq.Expressions.Tests teminates with StackOverflowException

Discovering: System.Linq.Expressions.Tests
Discovered:  System.Linq.Expressions.Tests
Starting:    System.Linq.Expressions.Tests
Process is terminating due to StackOverflowException.
Aborted (core dumped)

Inside, CompileDeepTree_NoStackOverflowFast test, it tests the compiler with 100 constant expression with 128KB stack size. I'm running the test with debug version of corerun (release is not stable yet).
When I change the stack size to 512K, this method passes.

I'm not sure how to think about this problem.
Should I test this with Release version only or is there any other ways to fix this?

Test code in CoreFX

[Theory, ClassData(typeof(CompilationTypes))]
public static void CompileDeepTree_NoStackOverflowFast(bool useInterpreter)
{
    Expression e = Expression.Constant(0);

    int n = 100;

    for (int i = 0; i < n; i++)
        e = Expression.Add(e, Expression.Constant(1));

    Func<int> f = null;
    // Request a stack size of 128KiB to get very small stack.
    // This reduces the size of tree needed to risk a stack overflow.
    // This though will only risk overflow once, so the outerloop test
    // above is still needed.
    Thread t = new Thread(() => f = Expression.Lambda<Func<int>>(e).Compile(useInterpreter), 128 * 1024);
    t.Start();
    t.Join();

    Assert.Equal(n, f());
}

@seanshpark commented on Thu Apr 20 2017

@parjong, do you have any comments to add?

@seanshpark commented on Thu Apr 20 2017

@jkotas , could you please help or invite who could be a help?

@parjong commented on Thu Apr 20 2017

I do not have much to add.

@seanshpark commented on Thu Apr 20 2017

With a same method(without the Assert.Equal(n, f()); line) run in a console app, gdb stack shows like this.
this=0x0 in Compiler::compStressCompile as seems that stack has ran out.

(gdb) bt
#0  0xb17e3ab8 in Compiler::compStressCompile (this=0x0, stressArea=Compiler::STRESS_NONE, weight=0) at /home/maxwell/netcore/coreclr/src/jit/compiler.cpp:3529
dotnet/corefx#1  0xb19879ec in Compiler::fgMorphTree (this=0x80b3a20, tree=0x80ba2a8, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:14681
dotnet/corefx#2  0xb199c320 in Compiler::fgMorphSmpOp (this=0x80b3a20, tree=0x80ba368, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:11434
dotnet/corefx#3  0xb1987d61 in Compiler::fgMorphTree (this=0x80b3a20, tree=0x80ba368, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:14767
dotnet/corefx#4  0xb199c320 in Compiler::fgMorphSmpOp (this=0x80b3a20, tree=0x80ba428, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:11434
...
dotnet/corefx#61 0xb1987d61 in Compiler::fgMorphTree (this=0x80b3a20, tree=0x80bb8c8, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:14767
dotnet/runtime#13867 0xb19b168d in Compiler::fgMorphStmts (this=0x80b3a20, block=0x80b6c34, mult=0xb4460227, lnot=0xb4460226, loadw=0xb4460225) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:15579
dotnet/runtime#13868 0xb19b2388 in Compiler::fgMorphBlocks (this=0x80b3a20) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:15864
dotnet/corefx#64 0xb19b612f in Compiler::fgMorph (this=0x80b3a20) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:17062
dotnet/corefx#65 0xb17e8769 in Compiler::compCompile (this=0x80b3a20, methodCodePtr=0xb4460b0c, methodCodeSize=0xb4460f20, compileFlags=0xb4460b20) at /home/maxwell/netcore/coreclr/src/jit/compiler.cpp:4299
...
dotnet/corefx#75 0xb6cc4036 in CallCompileMethodWithSEHWrapper(EEJitManager*, CEEInfo*, CORINFO_METHOD_INFO*, CORJIT_FLAGS, unsigned char**, unsigned int*, MethodDesc*)::$_4::operator()(CallCompileMethodWithSEHWrapper(EEJitManager*, CEEInfo*, CORINFO_METHOD_INFO*, CORJIT_FLAGS, unsigned char**, unsigned int*, MethodDesc*)::Param*) const (this=0xb4460d10, pParam=0xb4460d18) at /home/maxwell/netcore/coreclr/src/vm/jitinterface.cpp:12126
dotnet/corefx#76 0xb6cc3ea9 in CallCompileMethodWithSEHWrapper (jitMgr=0x8084cb8, comp=0xb4461050, info=0xb4460fa0, flags=..., nativeEntry=0xb4460f24, nativeSizeOfCode=0xb4460f20, ftn=0xb175a1a0)
    at /home/maxwell/netcore/coreclr/src/vm/jitinterface.cpp:12169
dotnet/runtime#13873 0xb6cc608e in UnsafeJitFunction (ftn=0xb175a1a0, ILHeader=0x0, flags=..., pSizeOfCode=0xb4461f44) at /home/maxwell/netcore/coreclr/src/vm/jitinterface.cpp:12813
dotnet/corefx#78 0xb720ff41 in MethodDesc::MakeJitWorker (this=0xb175a1a0, ILHeader=0x0, flags=...) at /home/maxwell/netcore/coreclr/src/vm/prestub.cpp:490
dotnet/corefx#79 0xb7214505 in MethodDesc::DoPrestub (this=0xb175a1a0, pDispatchingMT=0x0) at /home/maxwell/netcore/coreclr/src/vm/prestub.cpp:1515
dotnet/corefx#80 0xb6fd2d75 in ReflectionInvocation::CompileMethod (pMD=0xb175a1a0) at /home/maxwell/netcore/coreclr/src/vm/reflectioninvocation.cpp:2006
dotnet/corefx#81 0xb1778a84 in ?? ()
...
...

@jkotas commented on Fri Apr 21 2017

This test was added by @JonHanna in dotnet/corefx#14444

@JonHanna commented on Fri Apr 21 2017

Has something changed in TryEnsureSufficientExecutionStack in debug that maybe the test is catching, or is this a test configuration that is new?

When I change the stack size to 512K, this method passes

From https://github.com/dotnet/corefx/issues/17550#issuecomment-289816350 it would seem that it'll pass with 512K because it won't test what it's trying to test. What happens with 192K?

We could up both the stacksize and the size of the tree compiled to keep it testing what it is there to test without, though that increases execution time and the point is to do a similar job to the slower CompileDeepTree_NoStackOverflow without spending so long that it impacts heavily on how long the test suite takes. After a certain point the test no longer earns its keep, especially as there is that slower test being run on outerloop.

@parjong commented on Fri Apr 21 2017

@JonHanna Yes, this issue comes from a completely new configuration (x86/Linux Debug). CLR has been ported to x86/Linux. Recent CLR unittest result shows that debug build is stable, and thus we attempted to run FX unittest as the next step (checked/release build is under bring up).

@JonHanna commented on Fri Apr 21 2017

It would be worth temporarily taking the OuterLoop tag off this test's outerloop cousin and seeing if it passes. If that one also fails then the problem is in TryEnsureSufficientExecutionStack on that configuration and the tests have found a bug. If that one fails then the innerloop test needs tweaking or disabling.

@parjong commented on Fri Apr 21 2017

@JonHanna We could run System.Linq.Expressions.Tests without stack overflow if this case is excluded (although there are several failures).

@JonHanna commented on Fri Apr 21 2017

If CompileDeepTree_NoStackOverflow passes (but note that it's outerloop and won't be included in most runs, I'm beginning to think that this test is more trouble than it's worth; anything it catches will still be caught by the outerloop tests, albeit not on most test runs.

@janvorli commented on Fri Apr 21 2017

Taking a look at how the safe execution stack limit is determined, I can see that it checks for 128kB of remaining stack space on 64 bit architectures, but only for 64kB on 32 bit architectures. Could that be causing the problem?

@parjong commented on Fri Apr 21 2017

@JonHanna Oops. I mean all the innerloop tests. Sorry for confusion. I will let you know the result when the result is ready (I guess that I could let you know the result in the next Monday).

@seanshpark commented on Sun Apr 23 2017

it would seem that it'll pass with 512K because it won't test what it's trying to test. What happens with 192K?

Thank you for the explanation. 192K also fails and 384K also but didn't check the number that was working. We'll check the cousin in the outter loop.

@parjong commented on Sun Apr 23 2017

@JonHanna It turns out that the corresponding outerloop test also fails with the same reason (stack overflow) if debug binary is used. This issue does not happen for checked binary.

Is it possible to add some trait that allows us to exclude this test for debug build? We could not use checked binary as checked binary is not stable yet.

@seanshpark commented on Sun Apr 23 2017

I've also tested and the results are

Debug CoreCLR + CompileDeepTree_NoStackOverflow : Pass
Checked CoreCLR + CompileDeepTree_NoStackOverflow : Pass
Debug CoreCLR + CompileDeepTree_NoStackOverflowFast : Fail
Checked CoreCLR + CompileDeepTree_NoStackOverflowFast : Pass

@JonHanna commented on Mon Apr 24 2017

It turns out that the corresponding outerloop test also fails with the same reason (stack overflow) if debug binary is used.

If the outerloop test is failing, I'm rather worried that it might be an actual bug in TryEnsureSufficientExecutionStack or that it isn't sensitive enough an we can expect other uses of it to have a similar problem. Though @seanshpark found differently to you, which is curious in itself.

Is it possible to add some trait that allows us to exclude this test for debug build?

We could of course wrap it in #if !DEBUG, but someone else may know of a trait that's applicable.

@parjong commented on Mon Apr 24 2017

@JonHanna Could you let me know why this test uses 128kB as a limit? As @janvorli mentioned in https://github.com/dotnet/coreclr/issues/11122#issuecomment-296169837, TryEnsureSufficientExecutionStack returns true if the remaining stack size is 64kB for 32-bit architecture instead of 128kB.

@seanshpark commented on Mon Apr 24 2017

Sorry for the wrong report. I'v retested with debug version from latest master and both fails.

Debug CoreCLR + CompileDeepTree_NoStackOverflow : Fail
Debug CoreCLR + CompileDeepTree_NoStackOverflowFast : Fail

@JonHanna commented on Mon Apr 24 2017

Well the idea of the "fast" one is to have a small stack so that it quickly risks stack overflow, but TryEnsureSufficientExecutionStack saves the day and it doesn't happen. Clearly not the actual result. The "slow" one does the same thing with a more normal stack. If only the fast failed I might give it up as overly artificial, but the slow is triggering conditions that were causing users problems with real code, so I'm much more worried about that. Is there a how-to on the type of build you are doing so I can take a look myself?

CC @bartdesmet as the first author of the stackguard code and test.

@parjong commented on Mon Apr 24 2017

@JonHanna https://github.com/dotnet/coreclr/issues/9265#issuecomment-280521257 provides a brief step-by-step build instruction (you first need to create a root filesystem via running build-rootfs.sh under cross with x86 option).

The text was updated successfully, but these errors were encountered:

karelz · 2017-04-26T22:36:46Z

@gkhanna79 why CoreFX repo? Can you please suggest area to start with?

gkhanna79 · 2017-04-26T22:39:43Z

It is a CoreFX test failure and hence, I moved it here to track it closer to home.

VSadov · 2017-05-08T18:21:16Z

We should use conditional compilation and make Debug use bigger stack - like 512. It is known that Debug needs more stack.

VSadov · 2017-05-08T20:09:35Z

Actually, looking in more details at the bug reports, it seems that it is a problem with TryEnsureSufficientExecutionStack, since a similar test with much larger stack (CompileDeepTree_NoStackOverflow) also fails.

In such case there is no point in increasing stack, since the probing for remaining stack is not working.

VSadov · 2017-05-08T20:13:20Z

@gkhanna79 - since this looks like an issue with behavior of TryEnsureSufficientExecutionStack in debug, where should this go?

gkhanna79 · 2017-05-08T20:23:43Z

What is the issue with API? It was designed (with a basic heuristic) for basic CER scenarios and not for determining (dynamically) available stack space.

VSadov · 2017-05-08T20:48:27Z

@gkhanna79 the TryEnsureSufficientExecutionStack API is used as a general way of probing for the available stack space. For example in the async machinery - https://github.com/dotnet/coreclr/blob/0623eb3c85078d84284e5c75eab1ce978ede62f7/src/mscorlib/src/System/Threading/Tasks/Task.cs#L6477

Are we not supposed to use it in this way?

CC; @stephentoub

jkotas · 2017-05-08T21:01:55Z

TryEnsureSufficientExecutionStack ensures probes for available stack space for typical function. The test is not using it for typical function (100x nested Add is not typical), and that's why it does not work well.

VSadov · 2017-05-08T21:20:37Z

@jkotas compiler compiles expressions recursively and will probe for sufficient stack for every Add.

100x is just the way to cause the recursion in the test, but not impossible in the actual source. It could be hit by compiling something like "x1 + x2 + x3 + x4 +...+ x100"

What we have here is the same stack-probing pattern as in async. If we have problems here we might also have problems with other uses of this API.

jkotas · 2017-05-08T21:32:48Z

@jkotas compiler compiles expressions recursively and will probe for sufficient stack for every Add.

It is not what happens once this hits the JIT. Check the failing stacktrace above. It has the following two frames nested many times:

0xb19879ec in Compiler::fgMorphTree (this=0x80b3a20, tree=0x80ba2a8, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:14681
0xb199c320 in Compiler::fgMorphSmpOp (this=0x80b3a20, tree=0x80ba368, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:11434

VSadov · 2017-05-08T23:29:19Z

@jkotas - aaah, I see. It was not the crash while ET compiler compiled the lambda into IL. Likely because the SO-avoidance actually worked. It was the JIT compiler, which is also recursive, while JIT-ting the method body.

I thought the JIT-ting will not happen until we actually execute the delegate, which should be happening on a regular thread with regular stack.
Does it actually JIT when the delegate is created?

If that is the case we may need to rethink the testing strategy.,

VSadov · 2017-05-09T21:42:25Z

@seanshpark - it appears that the test equally stresses the Expression Tree Compiler and the JIT compiler.

Normally this is not a problem since JIT compiler uses much less resources and 128Kb of stack is more than enough to JIT this method.
That does not seem to be the case when running on Debug.

Considering that running on Debug is not common, is it possible to exclude the test from debug runs?

jkotas · 2017-05-09T21:58:16Z

We do not have a good reliable way to detect that you are running on debug runtime or debug build of the JIT. It would be really useful if all inner loop corefx tests work on debug builds of the runtime or JIT, on all platforms.

If this is a corner case test that does not work on debug runtime, I would suggest moving it to outer loop.

VSadov · 2017-05-09T22:01:30Z

Will move to the outer loop then.

seanshpark · 2017-05-12T00:30:32Z

@VSadov , moving to Outerloop is OK. Thank you!

VSadov self-assigned this May 8, 2017

stephentoub closed this as completed in dotnet/corefx#19563 May 9, 2017

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 2.0.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CompileDeepTree_NoStackOverflowFast fails StackOverflowException with debug build #21374

CompileDeepTree_NoStackOverflowFast fails StackOverflowException with debug build #21374

gkhanna79 commented Apr 26, 2017

karelz commented Apr 26, 2017

gkhanna79 commented Apr 26, 2017

VSadov commented May 8, 2017

VSadov commented May 8, 2017

VSadov commented May 8, 2017

gkhanna79 commented May 8, 2017

VSadov commented May 8, 2017

jkotas commented May 8, 2017

VSadov commented May 8, 2017

jkotas commented May 8, 2017

VSadov commented May 8, 2017

VSadov commented May 9, 2017

jkotas commented May 9, 2017

VSadov commented May 9, 2017

seanshpark commented May 12, 2017

CompileDeepTree_NoStackOverflowFast fails StackOverflowException with debug build #21374

CompileDeepTree_NoStackOverflowFast fails StackOverflowException with debug build #21374

Comments

gkhanna79 commented Apr 26, 2017

karelz commented Apr 26, 2017

gkhanna79 commented Apr 26, 2017

VSadov commented May 8, 2017

VSadov commented May 8, 2017

VSadov commented May 8, 2017

gkhanna79 commented May 8, 2017

VSadov commented May 8, 2017

jkotas commented May 8, 2017

VSadov commented May 8, 2017

jkotas commented May 8, 2017

VSadov commented May 8, 2017

VSadov commented May 9, 2017

jkotas commented May 9, 2017

VSadov commented May 9, 2017

seanshpark commented May 12, 2017