You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First the disclaimer (I guess it may be relevant): I work at @DataDog on profiling for the ddtrace gem, although we don't use rb_profile_frames (tradeoffs, tradeoffs 🤣 ).
While doing a few experiments with stackprof (0.2.17), I noticed that I got VM crashes when the process was finishing up with stackprof still enabled. I can reproduce this on both Linux and macOS, on Ruby 3.0.1, 2.7.3 and 2.6.7.
The following script is enough to reproduce it every time for me:
I went through a few core dumps (let me know if you'd like me to share them here) and the pattern seems that stackprof keeps sampling even through the VM is already "closing shop", in particular GET_EC() is null and so rb_postponed_job_register_one blows up as it tries to dereference it.
Here's one such example:
# ruby stackprof-crash.rb
[BUG] Segmentation fault at 0x0000000000000038
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]
Segmentation fault (core dumped)
# gdb `which ruby` core
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/ruby...done.
[New LWP 723]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/bin/ruby stackprof-crash.rb'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fcfbabf09cf in rb_vm_bugreport (ctx=ctx@entry=0x5652a8b5bfc0) at vm_core.h:1789
1789 vm_core.h: No such file or directory.
(gdb) bt
#0 0x00007fcfbabf09cf in rb_vm_bugreport (ctx=ctx@entry=0x5652a8b5bfc0) at vm_core.h:1789
#1 0x00007fcfbaa303c7 in rb_bug_for_fatal_signal (default_sighandler=0x0, sig=sig@entry=11, ctx=ctx@entry=0x5652a8b5bfc0, fmt=fmt@entry=0x7fcfbac33fab "Segmentation fault at %p") at error.c:658
#2 0x00007fcfbab5c9fb in sigsegv (sig=11, info=0x5652a8b5c0f0, ctx=0x5652a8b5bfc0) at signal.c:946
#3 <signal handler called>
#4 rb_ec_vm_ptr (ec=0x0) at vm_core.h:1777
#5 rb_postponed_job_register_one (flags=flags@entry=0, func=func@entry=0x7fcfb69b6440 <stackprof_job_handler>, data=data@entry=0x0) at vm_trace.c:1611
#6 0x00007fcfb69b50c8 in stackprof_signal_handler (sig=<optimized out>, sinfo=<optimized out>, ucontext=<optimized out>) at stackprof.c:642
#7 stackprof_signal_handler (sig=<optimized out>, sinfo=<optimized out>, ucontext=<optimized out>) at stackprof.c:621
#8 <signal handler called>
#9 0x00007fcfba38edf6 in malloc_consolidate (av=av@entry=0x7fcfba4cac40 <main_arena>) at malloc.c:4494
#10 0x00007fcfba39079a in _int_free (av=0x7fcfba4cac40 <main_arena>, p=0x5652a9003c90, have_lock=<optimized out>) at malloc.c:4392
#11 0x00007fcfbaa517f1 in objspace_xfree (old_size=<optimized out>, ptr=0x5652a9003ca0, objspace=0x5652a8a52320) at gc.c:10052
#12 objspace_xfree (old_size=<optimized out>, ptr=0x5652a9003ca0, objspace=0x5652a8a52320) at gc.c:9985
#13 ruby_sized_xfree (size=0, x=0x5652a9003ca0) at gc.c:10145
#14 ruby_sized_xfree (size=0, x=0x5652a9003ca0) at gc.c:10142
#15 ruby_xfree (x=0x5652a9003ca0) at gc.c:10152
#16 0x00007fcfbab66bd2 in rb_st_free_table (tab=0x5652a8a97eb0) at st.c:712
#17 0x00007fcfbabda154 in ruby_vm_destruct (vm=0x5652a8a50d60) at vm.c:2349
#18 0x00007fcfbaa39e4a in rb_ec_cleanup (ec=ec@entry=0x5652a8a527b0, ex=<optimized out>) at eval.c:261
#19 0x00007fcfbaa3a033 in ruby_run_node (n=0x5652a957e3e0) at eval.c:335
#20 0x00005652a840510b in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:50
(gdb)
I'm guessing a possible solution can be to check if GET_EC() is sane in the signal handler, and give up if it's not.
Of course, calling StackProf.stop also "fixes" the issue, so it may be just a case of documenting that bad things will happen to your Ruby if you forget to call it (or even automatically add an at_exit for it?).
Thanks for your awesome work in making Ruby code faster, btw! 😊
The text was updated successfully, but these errors were encountered:
Howdy! 👋
First the disclaimer (I guess it may be relevant): I work at @DataDog on profiling for the ddtrace gem, although we don't use
rb_profile_frames
(tradeoffs, tradeoffs 🤣 ).While doing a few experiments with stackprof (0.2.17), I noticed that I got VM crashes when the process was finishing up with stackprof still enabled. I can reproduce this on both Linux and macOS, on Ruby 3.0.1, 2.7.3 and 2.6.7.
The following script is enough to reproduce it every time for me:
I went through a few core dumps (let me know if you'd like me to share them here) and the pattern seems that stackprof keeps sampling even through the VM is already "closing shop", in particular
GET_EC()
isnull
and sorb_postponed_job_register_one
blows up as it tries to dereference it.Here's one such example:
I'm guessing a possible solution can be to check if
GET_EC()
is sane in the signal handler, and give up if it's not.Of course, calling
StackProf.stop
also "fixes" the issue, so it may be just a case of documenting that bad things will happen to your Ruby if you forget to call it (or even automatically add anat_exit
for it?).Thanks for your awesome work in making Ruby code faster, btw! 😊
The text was updated successfully, but these errors were encountered: