-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault on fac test with OCaml 4.12.0 #175
Comments
One thing I know is that stack overflow detection is dirty business; you are trying to catch an OS-level failure and turn it back into an in-language exception that is catchable. Stackoverflow detection is known to be flaky on some platform; I don't know much about this part of the runtime, and in particular I don't know how stack-overflow detection is implemented on amd64. My vague recollection wa that it should be reliable, except for functions with an artificially-large stack size. But here you seem to observe flakiness (you manage to turn stack overflows back into segfaults some of the time), and this may be a 4.12 regression. cc @stedolan ; stack overflow detection becoming flaky on amd64 in 4.12, would that by chance ring a bell? |
I can't think of any particular reason why stack overflow would be less reliable in 4.12. I can reproduce this, so I'm digging in a bit. So far it's looking like a bug in 4.12. (Incidentally, on my machine only the |
Here is another variant also triggering a segfault, now over lists (this one doesn't use an explicit open QCheck
let list_equal_dupl =
Test.make
(make
(*~shrink:(fun x -> Printf.printf ".%!"; Shrink.list_spine x)*)
~shrink:Shrink.list_spine
(Gen.list_size (Gen.return 650_000) (Gen.return 0)))
(fun xs ->
(*Printf.printf ".%!";*)
xs = xs @ xs)
;;
Test.check_exn list_equal_dupl
(* or use: *)
(* QCheck_base_runner.run_tests ~verbose:true ~rand [list_equal_dupl] *) The symptoms are the same: segfault on 4.12.0 regardless of QCheck version, not reproducable on 4.11.2, |
@stedolan wild ideas: stackoverflow handling uses signals, 4.12 has your EINTR-based signal work (ocaml#9722), that work changes stuff in input/output channel primitives, printing to stdout seems involved in the issue here. |
The bug seems to be that the runtime doesn't restore the GC state correctly when recovering from a stack overflow. In particular, young allocations made before the overflow was caught can sometimes be incorrectly overwritten by future allocations. There's a fix in ocaml/ocaml#10633 (I'm now in the "how did this ever work?!" phase of debugging: the dodgy code seems to be older than 4.12) |
Ah, it turns out this was broken in 4.11 as well, but this particular case didn't segfault. The testcase in ocaml/ocaml#10633 is broken even in 4.11, I haven't checked how far back the bug goes. |
Closing as this was an OCaml issue (with a merged fix). |
I'm having problems with
test_fac_issue59
from test/core/QCheck_expect_test.ml and ... issue #59.Stripping away the expect tests I'm down to this:
I've also cut away dune to reduce the number of moving parts:
Compiling with debug information and running under
gdb
pinpoints thefac
function and can show a long stack trace offac
frames. I have not been able to get the debug-executable to print a stack trace withOCAMLRUNPARAM=b
(or variants).The segfault disappears and the test behaves as expected if I
Printf.printf
orI get this behaviour with OCaml 4.12.0 across QCheck versions 0.18, 0.17, 0.15, 0.9 (I just tried a selection).
I cannot reproduce it with OCaml 4.11.2 on either of these QCheck versions.
I don't experience the problem if I instead compile and run with the bytecode backend.
I'd be grateful if others could confirm this behaviour to help understand if OS and hardware play in.
The above could indicate an OCaml issue - but it could affect QCheck when we include and trigger the test in our CI.
This is on a Linux machine, kernel 5.4.0-81, hardware is a Thinkpad with a 64-bit, dual-core Intel i5 CPU.
ulimit -s
reports a stack limit of8192
.I sometimes experience the bug as somewhat flaky (caveat: may be coffee/sleep underflow on my part... 😄)
A variant (probably increasing the required stack height) just repeats the test twice:
I have sometimes experienced this variant to segfault when the first one stopped doing so.
(I'll tag @gasche as he has had his hands in both QCheck and the OCaml compiler...)
The text was updated successfully, but these errors were encountered: