-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CompileDeepTree_NoStackOverflowFast fails StackOverflowException with debug build #21374
Comments
@gkhanna79 why CoreFX repo? Can you please suggest area to start with? |
It is a CoreFX test failure and hence, I moved it here to track it closer to home. |
We should use conditional compilation and make Debug use bigger stack - like 512. It is known that Debug needs more stack. |
Actually, looking in more details at the bug reports, it seems that it is a problem with TryEnsureSufficientExecutionStack, since a similar test with much larger stack (CompileDeepTree_NoStackOverflow) also fails. In such case there is no point in increasing stack, since the probing for remaining stack is not working. |
@gkhanna79 - since this looks like an issue with behavior of |
What is the issue with API? It was designed (with a basic heuristic) for basic CER scenarios and not for determining (dynamically) available stack space. |
@gkhanna79 the Are we not supposed to use it in this way? CC; @stephentoub |
TryEnsureSufficientExecutionStack ensures probes for available stack space for typical function. The test is not using it for typical function (100x nested Add is not typical), and that's why it does not work well. |
@jkotas compiler compiles expressions recursively and will probe for sufficient stack for every Add. 100x is just the way to cause the recursion in the test, but not impossible in the actual source. It could be hit by compiling something like "x1 + x2 + x3 + x4 +...+ x100" What we have here is the same stack-probing pattern as in async. If we have problems here we might also have problems with other uses of this API. |
It is not what happens once this hits the JIT. Check the failing stacktrace above. It has the following two frames nested many times: 0xb19879ec in Compiler::fgMorphTree (this=0x80b3a20, tree=0x80ba2a8, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:14681 |
@jkotas - aaah, I see. It was not the crash while ET compiler compiled the lambda into IL. Likely because the SO-avoidance actually worked. It was the JIT compiler, which is also recursive, while JIT-ting the method body. I thought the JIT-ting will not happen until we actually execute the delegate, which should be happening on a regular thread with regular stack. If that is the case we may need to rethink the testing strategy., |
@seanshpark - it appears that the test equally stresses the Expression Tree Compiler and the JIT compiler. Normally this is not a problem since JIT compiler uses much less resources and 128Kb of stack is more than enough to JIT this method. Considering that running on Debug is not common, is it possible to exclude the test from debug runs? |
We do not have a good reliable way to detect that you are running on debug runtime or debug build of the JIT. It would be really useful if all inner loop corefx tests work on debug builds of the runtime or JIT, on all platforms. If this is a corner case test that does not work on debug runtime, I would suggest moving it to outer loop. |
Will move to the outer loop then. |
@VSadov , moving to Outerloop is OK. Thank you! |
@seanshpark commented on Thu Apr 20 2017
CoreFX System.Linq.Expressions.Tests teminates with StackOverflowException
Inside,
CompileDeepTree_NoStackOverflowFast
test, it tests the compiler with 100 constant expression with 128KB stack size. I'm running the test with debug version of corerun (release is not stable yet).When I change the stack size to 512K, this method passes.
I'm not sure how to think about this problem.
Should I test this with Release version only or is there any other ways to fix this?
Test code in CoreFX
@seanshpark commented on Thu Apr 20 2017
@parjong, do you have any comments to add?
@seanshpark commented on Thu Apr 20 2017
@jkotas , could you please help or invite who could be a help?
@parjong commented on Thu Apr 20 2017
I do not have much to add.
@seanshpark commented on Thu Apr 20 2017
With a same method(without the
Assert.Equal(n, f());
line) run in a console app, gdb stack shows like this.this=0x0
inCompiler::compStressCompile
as seems that stack has ran out.@jkotas commented on Fri Apr 21 2017
This test was added by @JonHanna in dotnet/corefx#14444
@JonHanna commented on Fri Apr 21 2017
Has something changed in
TryEnsureSufficientExecutionStack
in debug that maybe the test is catching, or is this a test configuration that is new?From https://github.com/dotnet/corefx/issues/17550#issuecomment-289816350 it would seem that it'll pass with 512K because it won't test what it's trying to test. What happens with 192K?
We could up both the stacksize and the size of the tree compiled to keep it testing what it is there to test without, though that increases execution time and the point is to do a similar job to the slower
CompileDeepTree_NoStackOverflow
without spending so long that it impacts heavily on how long the test suite takes. After a certain point the test no longer earns its keep, especially as there is that slower test being run on outerloop.@parjong commented on Fri Apr 21 2017
@JonHanna Yes, this issue comes from a completely new configuration (x86/Linux Debug). CLR has been ported to x86/Linux. Recent CLR unittest result shows that debug build is stable, and thus we attempted to run FX unittest as the next step (checked/release build is under bring up).
@JonHanna commented on Fri Apr 21 2017
It would be worth temporarily taking the OuterLoop tag off this test's outerloop cousin and seeing if it passes. If that one also fails then the problem is in
TryEnsureSufficientExecutionStack
on that configuration and the tests have found a bug. If that one fails then the innerloop test needs tweaking or disabling.@parjong commented on Fri Apr 21 2017
@JonHanna We could run System.Linq.Expressions.Tests without stack overflow if this case is excluded (although there are several failures).
@JonHanna commented on Fri Apr 21 2017
If
CompileDeepTree_NoStackOverflow
passes (but note that it's outerloop and won't be included in most runs, I'm beginning to think that this test is more trouble than it's worth; anything it catches will still be caught by the outerloop tests, albeit not on most test runs.@janvorli commented on Fri Apr 21 2017
Taking a look at how the safe execution stack limit is determined, I can see that it checks for 128kB of remaining stack space on 64 bit architectures, but only for 64kB on 32 bit architectures. Could that be causing the problem?
@parjong commented on Fri Apr 21 2017
@JonHanna Oops. I mean all the innerloop tests. Sorry for confusion. I will let you know the result when the result is ready (I guess that I could let you know the result in the next Monday).
@seanshpark commented on Sun Apr 23 2017
Thank you for the explanation. 192K also fails and 384K also but didn't check the number that was working. We'll check the cousin in the outter loop.
@parjong commented on Sun Apr 23 2017
@JonHanna It turns out that the corresponding outerloop test also fails with the same reason (stack overflow) if debug binary is used. This issue does not happen for checked binary.
Is it possible to add some trait that allows us to exclude this test for debug build? We could not use checked binary as checked binary is not stable yet.
@seanshpark commented on Sun Apr 23 2017
I've also tested and the results are
@JonHanna commented on Mon Apr 24 2017
If the outerloop test is failing, I'm rather worried that it might be an actual bug in
TryEnsureSufficientExecutionStack
or that it isn't sensitive enough an we can expect other uses of it to have a similar problem. Though @seanshpark found differently to you, which is curious in itself.We could of course wrap it in
#if !DEBUG
, but someone else may know of a trait that's applicable.@parjong commented on Mon Apr 24 2017
@JonHanna Could you let me know why this test uses
128kB
as a limit? As @janvorli mentioned in https://github.com/dotnet/coreclr/issues/11122#issuecomment-296169837,TryEnsureSufficientExecutionStack
returns true if the remaining stack size is 64kB for 32-bit architecture instead of 128kB.@seanshpark commented on Mon Apr 24 2017
Sorry for the wrong report. I'v retested with debug version from latest master and both fails.
@JonHanna commented on Mon Apr 24 2017
Well the idea of the "fast" one is to have a small stack so that it quickly risks stack overflow, but TryEnsureSufficientExecutionStack saves the day and it doesn't happen. Clearly not the actual result. The "slow" one does the same thing with a more normal stack. If only the fast failed I might give it up as overly artificial, but the slow is triggering conditions that were causing users problems with real code, so I'm much more worried about that. Is there a how-to on the type of build you are doing so I can take a look myself?
CC @bartdesmet as the first author of the stackguard code and test.
@parjong commented on Mon Apr 24 2017
@JonHanna https://github.com/dotnet/coreclr/issues/9265#issuecomment-280521257 provides a brief step-by-step build instruction (you first need to create a root filesystem via running build-rootfs.sh under cross with x86 option).
The text was updated successfully, but these errors were encountered: