Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when stackprof is left running during VM shutdown #157

Open
ivoanjo opened this issue May 28, 2021 · 0 comments
Open

Comments

@ivoanjo
Copy link
Contributor

ivoanjo commented May 28, 2021

Howdy! 👋

First the disclaimer (I guess it may be relevant): I work at @DataDog on profiling for the ddtrace gem, although we don't use rb_profile_frames (tradeoffs, tradeoffs 🤣 ).

While doing a few experiments with stackprof (0.2.17), I noticed that I got VM crashes when the process was finishing up with stackprof still enabled. I can reproduce this on both Linux and macOS, on Ruby 3.0.1, 2.7.3 and 2.6.7.

The following script is enough to reproduce it every time for me:

require 'stackprof'

StackProf.start(mode: :wall, raw: true, aggregate: false, ignore_gc: true, interval: 100)
sleep 0.5

I went through a few core dumps (let me know if you'd like me to share them here) and the pattern seems that stackprof keeps sampling even through the VM is already "closing shop", in particular GET_EC() is null and so rb_postponed_job_register_one blows up as it tries to dereference it.

Here's one such example:

# ruby stackprof-crash.rb
[BUG] Segmentation fault at 0x0000000000000038
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]

Segmentation fault (core dumped)
# gdb `which ruby` core
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/ruby...done.
[New LWP 723]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/bin/ruby stackprof-crash.rb'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fcfbabf09cf in rb_vm_bugreport (ctx=ctx@entry=0x5652a8b5bfc0) at vm_core.h:1789
1789	vm_core.h: No such file or directory.
(gdb) bt
#0  0x00007fcfbabf09cf in rb_vm_bugreport (ctx=ctx@entry=0x5652a8b5bfc0) at vm_core.h:1789
#1  0x00007fcfbaa303c7 in rb_bug_for_fatal_signal (default_sighandler=0x0, sig=sig@entry=11, ctx=ctx@entry=0x5652a8b5bfc0, fmt=fmt@entry=0x7fcfbac33fab "Segmentation fault at %p") at error.c:658
#2  0x00007fcfbab5c9fb in sigsegv (sig=11, info=0x5652a8b5c0f0, ctx=0x5652a8b5bfc0) at signal.c:946
#3  <signal handler called>
#4  rb_ec_vm_ptr (ec=0x0) at vm_core.h:1777
#5  rb_postponed_job_register_one (flags=flags@entry=0, func=func@entry=0x7fcfb69b6440 <stackprof_job_handler>, data=data@entry=0x0) at vm_trace.c:1611
#6  0x00007fcfb69b50c8 in stackprof_signal_handler (sig=<optimized out>, sinfo=<optimized out>, ucontext=<optimized out>) at stackprof.c:642
#7  stackprof_signal_handler (sig=<optimized out>, sinfo=<optimized out>, ucontext=<optimized out>) at stackprof.c:621
#8  <signal handler called>
#9  0x00007fcfba38edf6 in malloc_consolidate (av=av@entry=0x7fcfba4cac40 <main_arena>) at malloc.c:4494
#10 0x00007fcfba39079a in _int_free (av=0x7fcfba4cac40 <main_arena>, p=0x5652a9003c90, have_lock=<optimized out>) at malloc.c:4392
#11 0x00007fcfbaa517f1 in objspace_xfree (old_size=<optimized out>, ptr=0x5652a9003ca0, objspace=0x5652a8a52320) at gc.c:10052
#12 objspace_xfree (old_size=<optimized out>, ptr=0x5652a9003ca0, objspace=0x5652a8a52320) at gc.c:9985
#13 ruby_sized_xfree (size=0, x=0x5652a9003ca0) at gc.c:10145
#14 ruby_sized_xfree (size=0, x=0x5652a9003ca0) at gc.c:10142
#15 ruby_xfree (x=0x5652a9003ca0) at gc.c:10152
#16 0x00007fcfbab66bd2 in rb_st_free_table (tab=0x5652a8a97eb0) at st.c:712
#17 0x00007fcfbabda154 in ruby_vm_destruct (vm=0x5652a8a50d60) at vm.c:2349
#18 0x00007fcfbaa39e4a in rb_ec_cleanup (ec=ec@entry=0x5652a8a527b0, ex=<optimized out>) at eval.c:261
#19 0x00007fcfbaa3a033 in ruby_run_node (n=0x5652a957e3e0) at eval.c:335
#20 0x00005652a840510b in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:50
(gdb)

I'm guessing a possible solution can be to check if GET_EC() is sane in the signal handler, and give up if it's not.

Of course, calling StackProf.stop also "fixes" the issue, so it may be just a case of documenting that bad things will happen to your Ruby if you forget to call it (or even automatically add an at_exit for it?).

Thanks for your awesome work in making Ruby code faster, btw! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant