Segmentation fault when stackprof is left running during VM shutdown #157

ivoanjo · 2021-05-28T09:22:08Z

Howdy! 👋

First the disclaimer (I guess it may be relevant): I work at @DataDog on profiling for the ddtrace gem, although we don't use rb_profile_frames (tradeoffs, tradeoffs 🤣 ).

While doing a few experiments with stackprof (0.2.17), I noticed that I got VM crashes when the process was finishing up with stackprof still enabled. I can reproduce this on both Linux and macOS, on Ruby 3.0.1, 2.7.3 and 2.6.7.

The following script is enough to reproduce it every time for me:

require 'stackprof'

StackProf.start(mode: :wall, raw: true, aggregate: false, ignore_gc: true, interval: 100)
sleep 0.5

I went through a few core dumps (let me know if you'd like me to share them here) and the pattern seems that stackprof keeps sampling even through the VM is already "closing shop", in particular GET_EC() is null and so rb_postponed_job_register_one blows up as it tries to dereference it.

Here's one such example:

# ruby stackprof-crash.rb
[BUG] Segmentation fault at 0x0000000000000038
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]

Segmentation fault (core dumped)
# gdb `which ruby` core
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/ruby...done.
[New LWP 723]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/bin/ruby stackprof-crash.rb'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fcfbabf09cf in rb_vm_bugreport (ctx=ctx@entry=0x5652a8b5bfc0) at vm_core.h:1789
1789	vm_core.h: No such file or directory.
(gdb) bt
#0  0x00007fcfbabf09cf in rb_vm_bugreport (ctx=ctx@entry=0x5652a8b5bfc0) at vm_core.h:1789
#1  0x00007fcfbaa303c7 in rb_bug_for_fatal_signal (default_sighandler=0x0, sig=sig@entry=11, ctx=ctx@entry=0x5652a8b5bfc0, fmt=fmt@entry=0x7fcfbac33fab "Segmentation fault at %p") at error.c:658
#2  0x00007fcfbab5c9fb in sigsegv (sig=11, info=0x5652a8b5c0f0, ctx=0x5652a8b5bfc0) at signal.c:946
#3  <signal handler called>
#4  rb_ec_vm_ptr (ec=0x0) at vm_core.h:1777
#5  rb_postponed_job_register_one (flags=flags@entry=0, func=func@entry=0x7fcfb69b6440 <stackprof_job_handler>, data=data@entry=0x0) at vm_trace.c:1611
#6  0x00007fcfb69b50c8 in stackprof_signal_handler (sig=<optimized out>, sinfo=<optimized out>, ucontext=<optimized out>) at stackprof.c:642
#7  stackprof_signal_handler (sig=<optimized out>, sinfo=<optimized out>, ucontext=<optimized out>) at stackprof.c:621
#8  <signal handler called>
#9  0x00007fcfba38edf6 in malloc_consolidate (av=av@entry=0x7fcfba4cac40 <main_arena>) at malloc.c:4494
#10 0x00007fcfba39079a in _int_free (av=0x7fcfba4cac40 <main_arena>, p=0x5652a9003c90, have_lock=<optimized out>) at malloc.c:4392
#11 0x00007fcfbaa517f1 in objspace_xfree (old_size=<optimized out>, ptr=0x5652a9003ca0, objspace=0x5652a8a52320) at gc.c:10052
#12 objspace_xfree (old_size=<optimized out>, ptr=0x5652a9003ca0, objspace=0x5652a8a52320) at gc.c:9985
#13 ruby_sized_xfree (size=0, x=0x5652a9003ca0) at gc.c:10145
#14 ruby_sized_xfree (size=0, x=0x5652a9003ca0) at gc.c:10142
#15 ruby_xfree (x=0x5652a9003ca0) at gc.c:10152
#16 0x00007fcfbab66bd2 in rb_st_free_table (tab=0x5652a8a97eb0) at st.c:712
#17 0x00007fcfbabda154 in ruby_vm_destruct (vm=0x5652a8a50d60) at vm.c:2349
#18 0x00007fcfbaa39e4a in rb_ec_cleanup (ec=ec@entry=0x5652a8a527b0, ex=<optimized out>) at eval.c:261
#19 0x00007fcfbaa3a033 in ruby_run_node (n=0x5652a957e3e0) at eval.c:335
#20 0x00005652a840510b in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:50
(gdb)

I'm guessing a possible solution can be to check if GET_EC() is sane in the signal handler, and give up if it's not.

Of course, calling StackProf.stop also "fixes" the issue, so it may be just a case of documenting that bad things will happen to your Ruby if you forget to call it (or even automatically add an at_exit for it?).

Thanks for your awesome work in making Ruby code faster, btw! 😊

The text was updated successfully, but these errors were encountered:

ivoanjo mentioned this issue Mar 16, 2023

Ensure VM is running in signal handler #200

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when stackprof is left running during VM shutdown #157

Segmentation fault when stackprof is left running during VM shutdown #157

ivoanjo commented May 28, 2021

Segmentation fault when stackprof is left running during VM shutdown #157

Segmentation fault when stackprof is left running during VM shutdown #157

Comments

ivoanjo commented May 28, 2021