Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leaks found by ruby_memcheck #105

Merged
merged 2 commits into from
Sep 22, 2023
Merged

Fix memory leaks found by ruby_memcheck #105

merged 2 commits into from
Sep 22, 2023

Conversation

mudge
Copy link
Owner

@mudge mudge commented Sep 22, 2023

See #104

Specifically:

  • Ensure we delete the previous input inside an RE2::Scanner before replacing it
  • Check whether inputs are strings as early as possible to avoid raising an exception after allocating memory
  • Make sure the error string populated by RE2::Set->Add() goes out of scope before calling rb_raise

@mudge mudge mentioned this pull request Sep 22, 2023
@mudge mudge requested a review from stanhu September 22, 2023 20:17
See #104

Specifically, ensure we delete the previous input inside an RE2::Scanner
before replacing it and check whether inputs are strings as early as
possible to avoid raising an exception after allocating memory.

Thanks to @peterzhu2118 for both authoring ruby_memcheck and helping
find the source of these leaks.
@mudge mudge force-pushed the plug-memory-leaks branch 3 times, most recently from 064d49a to b37b07a Compare September 22, 2023 20:25
ext/re2/re2.cc Outdated Show resolved Hide resolved
@mudge
Copy link
Owner Author

mudge commented Sep 22, 2023

@stanhu can you please see if you can run this locally and if it also gives a clean ruby_memcheck run?

See #104

When we raise an exception in re2_set_add, the memory used by the
std::string used to store the error message is never freed. Fix this by
ensuring it goes out of scope before we call rb_raise.

However, we also need a copy of what is inside it to return to the user
so we take a copy of its contents as a C string first. The current
longest error message inside RE2 is 35 characters long so 100 characters
gives us some headroom in case new releases of RE2 add longer messages.

Thanks to @peterzhu2118 for both authoring ruby_memcheck and helping
find the source of these leaks.
@stanhu
Copy link
Collaborator

stanhu commented Sep 22, 2023

I'm getting a clean ruby_memcheck, but for some reason I'm getting a core dump after running bundle exec rspec.

@mudge
Copy link
Owner Author

mudge commented Sep 22, 2023

That doesn’t sound good. CI seems happy so far, can you reproduce it after clobbering, etc.?

@stanhu
Copy link
Collaborator

stanhu commented Sep 22, 2023

It doesn't seg fault on this branch, but when I apply these local changes I seem to be getting seg faults:

diff --git a/Rakefile b/Rakefile
index 4712797..42517e6 100644
--- a/Rakefile
+++ b/Rakefile
@@ -6,6 +6,8 @@ require 'rake_compiler_dock'
 require 'yaml'

 require_relative 'ext/re2/recipes'
+require "ruby_memcheck"
+require "ruby_memcheck/rspec/rake_task"

 CLEAN.include FileList['**/*{.o,.so,.dylib,.bundle}'],
               FileList['**/extconf.h'],
@@ -129,4 +131,9 @@ task gem_build_path do
   add_vendored_libraries
 end

+RSpec::Core::RakeTask.new(spec: :compile)
+namespace :spec do
+  RubyMemcheck::RSpec::RakeTask.new(valgrind: :compile)
+end
+
 task default: [:compile, :spec]
diff --git a/ext/re2/extconf.rb b/ext/re2/extconf.rb
index f8fd706..c8c6681 100644
--- a/ext/re2/extconf.rb
+++ b/ext/re2/extconf.rb
@@ -119,8 +119,8 @@ end

 def build_extension(static_p = false)
   # Enable optional warnings but disable deprecated register warning for Ruby 2.6 support
-  $CFLAGS << " -Wall -Wextra -funroll-loops"
-  $CPPFLAGS << " -Wno-register"
+  $CFLAGS << " -Wall -Wextra -funroll-loops -ggdb3 -g"
+  $CPPFLAGS << " -Wno-register -ggdb3"

   # Pass -x c++ to force gcc to compile the test program
   # as C++ (as it will end in .c by default).
@@ -395,11 +395,11 @@ def build_with_vendored_libraries
   abseil_recipe, re2_recipe = load_recipes

   process_recipe(abseil_recipe) do |recipe|
-    recipe.configure_options += ['-DABSL_PROPAGATE_CXX_STD=ON', '-DCMAKE_CXX_VISIBILITY_PRESET=hidden']
+    recipe.configure_options += ['-DABSL_PROPAGATE_CXX_STD=ON', '-DCMAKE_CXX_VISIBILITY_PRESET=hidden', '-DCMAKE_CXXFLAGS=-ggdb3']
   end

   process_recipe(re2_recipe) do |recipe|
-    recipe.configure_options += ["-DCMAKE_PREFIX_PATH=#{abseil_recipe.path}", '-DCMAKE_CXX_FLAGS=-DNDEBUG',
+    recipe.configure_options += ["-DCMAKE_PREFIX_PATH=#{abseil_recipe.path}", '-DCMAKE_CXX_FLAGS=-DNDEBUG', '-DCMAKE_CXX_FLAGS=-ggdb3',
                                  '-DCMAKE_CXX_VISIBILITY_PRESET=hidden']
   end

diff --git a/re2.gemspec b/re2.gemspec
index dbb64de..32e7a55 100644
--- a/re2.gemspec
+++ b/re2.gemspec
@@ -40,5 +40,6 @@ Gem::Specification.new do |s|
   s.add_development_dependency("rake-compiler", "~> 1.2.1")
   s.add_development_dependency("rake-compiler-dock", "~> 1.3.0")
   s.add_development_dependency("rspec", "~> 3.2")
+  s.add_development_dependency("ruby_memcheck")
   s.add_runtime_dependency("mini_portile2", "~> 2.8.4") # keep version in sync with extconf.rb
 end

@stanhu
Copy link
Collaborator

stanhu commented Sep 22, 2023

gdb shows this backtrace:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7745859 in __GI_abort () at abort.c:79
#2  0x00007ffff3b3e76d in LogMessageFatal::~LogMessageFatal (this=0x7fffffffcbd0, __in_chrg=<optimized out>) at /tmp/d20230922-1835305-njhoyv/tmp/x86_64-pc-linux-gnu/ports/libre2/2023-09-01/re2-2023-09-01/util/logging.h:98
#3  0x00007ffff3b54134 in re2::RE2::Set::Match (this=0x555555e9d540, text="", v=0x7fffffffce30, error_info=0x0) at /tmp/d20230922-1835305-njhoyv/tmp/x86_64-pc-linux-gnu/ports/libre2/2023-09-01/re2-2023-09-01/re2/set.cc:131
#4  0x00007ffff3b54075 in re2::RE2::Set::Match (this=0x555555e9d540, text="", v=0x7fffffffce30) at /tmp/d20230922-1835305-njhoyv/tmp/x86_64-pc-linux-gnu/ports/libre2/2023-09-01/re2-2023-09-01/re2/set.cc:123
#5  0x00007ffff3b346fb in re2_set_match (argc=2, argv=0x7ffff7308520, self=140737278313720) at ../../../../ext/re2/re2.cc:1731
#6  0x00007ffff7e370e8 in vm_call_cfunc_with_frame (ec=0x55555555dcc0, reg_cfp=0x7ffff7407810, calling=<optimized out>) at vm_insnhelper.c:3037
#7  0x00007ffff7e4c199 in vm_call_method_each_type (ec=0x55555555dcc0, cfp=0x7ffff7407810, calling=0x7fffffffd090) at vm_insnhelper.c:3639
#8  0x00007ffff7e4ca64 in vm_call_method (ec=0x55555555dcc0, cfp=0x7ffff7407810, calling=<optimized out>) at vm_insnhelper.c:3750
#9  0x00007ffff7e45392 in vm_sendish (block_handler=<optimized out>, method_explorer=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at vm_callinfo.h:349
#10 vm_exec_core (ec=0x55555555dcc0, initial=140737488341280) at insns.def:778
#11 0x00007ffff7e4af73 in rb_vm_exec (ec=0x55555555dcc0, mjit_enable_p=true) at vm.c:2211
#12 0x00007ffff7e517c1 in invoke_block (captured=<optimized out>, opt_pc=<optimized out>, type=<optimized out>, cref=0x7ffff37b0598, self=140737278316600, iseq=0x7ffff38daba8, ec=0x55555555dcc0) at vm.c:1316
#13 invoke_iseq_block_from_c (me=0x0, is_lambda=<optimized out>, cref=0x7ffff37b0598, passed_block_handler=0, kw_splat=0, argv=<optimized out>, argc=1, self=140737278316600, captured=<optimized out>, ec=0x55555555dcc0) at vm.c:1372
#14 invoke_block_from_c_bh (force_blockarg=<optimized out>, is_lambda=<optimized out>, cref=<optimized out>, passed_block_handler=<optimized out>, kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>, block_handler=<optimized out>, ec=<optimized out>) at vm.c:1390
#15 vm_yield_with_cref (is_lambda=<optimized out>, cref=0x7ffff37b0598, kw_splat=0, argv=<optimized out>, argc=1, ec=0x55555555dcc0) at vm.c:1427
#16 yield_under (self=<optimized out>, singleton=<optimized out>, argc=1, argv=<optimized out>, kw_splat=0) at vm_eval.c:1969
#17 0x00007ffff7e370e8 in vm_call_cfunc_with_frame (ec=0x55555555dcc0, reg_cfp=0x7ffff7407910, calling=<optimized out>) at vm_insnhelper.c:3037
#18 0x00007ffff7e4c199 in vm_call_method_each_type (ec=0x55555555dcc0, cfp=0x7ffff7407910, calling=0x7fffffffd520) at vm_insnhelper.c:3639
#19 0x00007ffff7e4ca64 in vm_call_method (ec=0x55555555dcc0, cfp=0x7ffff7407910, calling=<optimized out>) at vm_insnhelper.c:3750
#20 0x00007ffff7e454a3 in vm_sendish (method_explorer=<optimized out>, block_handler=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at vm_callinfo.h:349
#21 vm_exec_core (ec=0x55555555dcc0, initial=140737488341280) at insns.def:759
#22 0x00007ffff7e4b498 in rb_vm_exec (ec=0x55555555dcc0, mjit_enable_p=true) at vm.c:2220
#23 0x00007ffff7e50486 in invoke_block (captured=<optimized out>, opt_pc=<optimized out>, type=<optimized out>, cref=<optimized out>, self=<optimized out>, iseq=<optimized out>, ec=<optimized out>) at vm.c:1316
#24 invoke_iseq_block_from_c (me=<optimized out>, is_lambda=<optimized out>, cref=<optimized out>, passed_block_handler=<optimized out>, kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>, self=<optimized out>, captured=<optimized out>, ec=<optimized out>) at vm.c:1372
#25 invoke_block_from_c_bh (argc=<optimized out>, kw_splat=<optimized out>, passed_block_handler=<optimized out>, cref=<optimized out>, is_lambda=<optimized out>, force_blockarg=<optimized out>, argv=<optimized out>, block_handler=<optimized out>, ec=<optimized out>) at vm.c:1390
#26 vm_yield_with_cref (argc=<optimized out>, kw_splat=<optimized out>, cref=<optimized out>, is_lambda=<optimized out>, argv=<optimized out>, ec=<optimized out>) at vm.c:1427
#27 vm_yield (kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>, ec=<optimized out>) at vm.c:1435
#28 rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm_eval.c:1347
#29 rb_yield (val=<optimized out>) at vm_eval.c:1363
#30 0x00007ffff7bc6a9c in rb_ary_collect (ary=140737278370640) at ./include/ruby/internal/core/rarray.h:372
#31 0x00007ffff7e370e8 in vm_call_cfunc_with_frame (ec=0x55555555dcc0, reg_cfp=0x7ffff7407bd0, calling=<optimized out>) at vm_insnhelper.c:3037
#32 0x00007ffff7e454a3 in vm_sendish (method_explorer=<optimized out>, block_handler=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at vm_callinfo.h:349
#33 vm_exec_core (ec=0x55555555dcc0, initial=140737488341280) at insns.def:759
#34 0x00007ffff7e4af73 in rb_vm_exec (ec=0x55555555dcc0, mjit_enable_p=true) at vm.c:2211
#35 0x00007ffff7e50486 in invoke_block (captured=<optimized out>, opt_pc=<optimized out>, type=<optimized out>, cref=<optimized out>, self=<optimized out>, iseq=<optimized out>, ec=<optimized out>) at vm.c:1316
#36 invoke_iseq_block_from_c (me=<optimized out>, is_lambda=<optimized out>, cref=<optimized out>, passed_block_handler=<optimized out>, kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>, self=<optimized out>, captured=<optimized out>, ec=<optimized out>) at vm.c:1372
#37 invoke_block_from_c_bh (argc=<optimized out>, kw_splat=<optimized out>, passed_block_handler=<optimized out>, cref=<optimized out>, is_lambda=<optimized out>, force_blockarg=<optimized out>, argv=<optimized out>, block_handler=<optimized out>, ec=<optimized out>) at vm.c:1390
#38 vm_yield_with_cref (argc=<optimized out>, kw_splat=<optimized out>, cref=<optimized out>, is_lambda=<optimized out>, argv=<optimized out>, ec=<optimized out>) at vm.c:1427
#39 vm_yield (kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>, ec=<optimized out>) at vm.c:1435
#40 rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm_eval.c:1347
#41 rb_yield (val=<optimized out>) at vm_eval.c:1363
#42 0x00007ffff7bc6a9c in rb_ary_collect (ary=140737278416320) at ./include/ruby/internal/core/rarray.h:372
#43 0x00007ffff7e370e8 in vm_call_cfunc_with_frame (ec=0x55555555dcc0, reg_cfp=0x7ffff7407cd0, calling=<optimized out>) at vm_insnhelper.c:3037
#44 0x00007ffff7e454a3 in vm_sendish (method_explorer=<optimized out>, block_handler=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at vm_callinfo.h:349
#45 vm_exec_core (ec=0x55555555dcc0, initial=140737488341280) at insns.def:759
#46 0x00007ffff7e4af73 in rb_vm_exec (ec=0x55555555dcc0, mjit_enable_p=true) at vm.c:2211
#47 0x00007ffff7e50486 in invoke_block (captured=<optimized out>, opt_pc=<optimized out>, type=<optimized out>, cref=<optimized out>, self=<optimized out>, iseq=<optimized out>, ec=<optimized out>) at vm.c:1316
#48 invoke_iseq_block_from_c (me=<optimized out>, is_lambda=<optimized out>, cref=<optimized out>, passed_block_handler=<optimized out>, kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>, self=<optimized out>, captured=<optimized out>, ec=<optimized out>) at vm.c:1372
#49 invoke_block_from_c_bh (argc=<optimized out>, kw_splat=<optimized out>, passed_block_handler=<optimized out>, cref=<optimized out>, is_lambda=<optimized out>, force_blockarg=<optimized out>, argv=<optimized out>, block_handler=<optimized out>, ec=<optimized out>) at vm.c:1390
#50 vm_yield_with_cref (argc=<optimized out>, kw_splat=<optimized out>, cref=<optimized out>, is_lambda=<optimized out>, argv=<optimized out>, ec=<optimized out>) at vm.c:1427
#51 vm_yield (kw_splat=<optimized out>, argv=<optimized out>, argc=<optimized out>, ec=<optimized out>) at vm.c:1435
#52 rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm_eval.c:1347
#53 rb_yield (val=<optimized out>) at vm_eval.c:1363
#54 0x00007ffff7bc6a9c in rb_ary_collect (ary=140737283577800) at ./include/ruby/internal/core/rarray.h:372
#55 0x00007ffff7e370e8 in vm_call_cfunc_with_frame (ec=0x55555555dcc0, reg_cfp=0x7ffff7407d90, calling=<optimized out>) at vm_insnhelper.c:3037
#56 0x00007ffff7e454a3 in vm_sendish (method_explorer=<optimized out>, block_handler=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at vm_callinfo.h:349
#57 vm_exec_core (ec=0x55555555dcc0, initial=140737488341280) at insns.def:759
#58 0x00007ffff7e4b498 in rb_vm_exec (ec=0x55555555dcc0, mjit_enable_p=true) at vm.c:2220
#59 0x00007ffff7c5e5cb in rb_ec_exec_node (ec=ec@entry=0x55555555dcc0, n=n@entry=0x7ffff3df6c40) at eval.c:280
#60 0x00007ffff7c64a0a in ruby_run_node (n=0x7ffff3df6c40) at eval.c:321
#61 0x000055555555517f in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:47

@stanhu
Copy link
Collaborator

stanhu commented Sep 22, 2023

Oh, right, I think my patch overrides the -NDEBUG, so libre2 throws an error. I've seen this before.

@stanhu
Copy link
Collaborator

stanhu commented Sep 22, 2023

All working now. Nice fixes here. I was too quick to think these were false positives.

@mudge
Copy link
Owner Author

mudge commented Sep 22, 2023

The (slightly terrifying) lesson here is the havoc rb_raise (and therefore the various Check_Type, StringValue, StringValuePtr, etc.) can cause if we’re relying on destructors to automatically free memory.

@mudge
Copy link
Owner Author

mudge commented Sep 22, 2023

A quick skim of other places to investigate that might be lacking test coverage: creating an RE2::Set with options (therefore creating an RE2::Options object internally) but also an invalid anchor (which will raise).

@stanhu stanhu merged commit de634de into main Sep 22, 2023
{
std::string err;
index = s->set->Add(regex, &err);
strncpy(msg, err.c_str(), sizeof(msg));
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also guard against the error message being truncated (and therefore not null terminated) with something like:

strncpy(msg, err.c_str(), sizeof(msg) - 1);
msg[sizeof(msg) - 1] = '\0';

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mudge
Copy link
Owner Author

mudge commented Sep 23, 2023

A quick skim of other places to investigate that might be lacking test coverage: creating an RE2::Set with options (therefore creating an RE2::Options object internally) but also an invalid anchor (which will raise).

I tested this but there’s no leak.

@mudge mudge deleted the plug-memory-leaks branch May 10, 2024 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants