valgrind inspired fixes #1693

crypto-ape · 2019-04-02T11:06:34Z

Hey Monkeys!

I had some troubles and run the witness_node with valgrind (the memcheck tool). As a result, I provide here some minor fixes.

Ape out!

…arameter type change

oxarbitrage · 2019-04-02T11:47:09Z

very nice, i wanted to run valgrind a while back and was unsuccessful. care to explain a bit the errors you were having and how the changes fixes them. thanks.

oxarbitrage · 2019-04-02T11:52:07Z

Related bitshares/bitshares-fc#118

Some valgrind output added into this pull requests will help. I also think we might be able to get some better or at least different results profiling than with the gprof.

A wiki on how you are doing this will be very helpful IMHO.

…d_ptr by copy

crypto-ape · 2019-04-02T14:47:12Z

fix referencing local stack variable in async thread

Valgrind reported both invalid reads and invalid writes with traces like this:

Invalid write of size 8
   at 0x3267A14: graphene::net::detail::statistics_gathering_node_delegate_wrapper::call_statistics_collector::starting_execution() (node_impl.hxx:133)
   by 0x32670D7: graphene::net::detail::statistics_gathering_node_delegate_wrapper::call_statistics_collector::actual_execution_measurement_helper::actual_execution_measurement_helper(graphene::net::detail::statistics_gathering_node_delegate_wrapper::call_statistics_collector&) (node_impl.hxx:92)
   by 0x324E5F9: graphene::net::detail::statistics_gathering_node_delegate_wrapper::handle_block(graphene::net::block_message const&, bool, std::vector<fc::ripemd160, std::allocator<fc::ripemd160> >&)::{lambda()#1}::operator()() const (node.cpp:4999)
   by 0x32622F8: fc::detail::functor_run<graphene::net::detail::statistics_gathering_node_delegate_wrapper::handle_block(graphene::net::block_message const&, bool, std::vector<fc::ripemd160, std::allocator<fc::ripemd160> >&)::{lambda()#1}>::run(void*, fc::detail::functor_run<graphene::net::detail::statistics_gathering_node_delegate_wrapper::handle_block(graphene::net::block_message const&, bool, std::vector<fc::ripemd160, std::allocator<fc::ripemd160> >&)::{lambda()#1}>) (task.hpp:77)
   by 0x301BC0B: fc::task_base::run_impl() (task.cpp:43)
   by 0x301BB8B: fc::task_base::run() (task.cpp:32)
   by 0x300E85D: fc::thread_d::run_next_task() (thread_d.hpp:513)
   by 0x300EBC2: fc::thread_d::process_tasks() (thread_d.hpp:562)
   by 0x300E30A: fc::thread_d::start_process_tasks(boost::context::detail::transfer_t) (thread_d.hpp:493)
   by 0x33CB9AE: make_fcontext (in /home/crypto-ape/work/valgrind/witness_node)
 Address 0x2b6447e8 is 2,095,016 bytes inside a block of size 2,097,152 alloc'd
   at 0x5022B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

The original code created a local variable on stack and deferred a lambda function reference-capturing the local variable into an asynchronous thread. At the time the lambda function was called, it used already invalidated memory. What is worse, it could rewrite a stack in use.

I have put the object into a shared_ptr and capture it in the lambda function by copy. After the last commit I have tested it again and compared the logs from valgrind. Definitely fixed.

explicitly cleanup external library facilities

Various memory leaks are detected with malloc being called from curl_easy_init. One example:

256 bytes in 1 blocks are indirectly lost in loss record 41 of 67
   at 0x5022B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5879171: ??? (in /usr/lib/x86_64-linux-gnu/libcurl.so.4.5.0)
   by 0x5886143: curl_easy_init (in /usr/lib/x86_64-linux-gnu/libcurl.so.4.5.0)
   by 0x2A5699A: graphene::elasticsearch::detail::elasticsearch_plugin_impl::elasticsearch_plugin_impl(graphene::elasticsearch::elasticsearch_plugin&) (elasticsearch_plugin.cpp:42)
   by 0x2A51150: graphene::elasticsearch::elasticsearch_plugin::elasticsearch_plugin() (elasticsearch_plugin.cpp:401)
   by 0x223F9E7: void __gnu_cxx::new_allocator<graphene::elasticsearch::elasticsearch_plugin>::construct<graphene::elasticsearch::elasticsearch_plugin>(graphene::elasticsearch::elasticsearch_plugin*) (new_allocator.h:136)
   by 0x223E87A: void std::allocator_traits<std::allocator<graphene::elasticsearch::elasticsearch_plugin> >::construct<graphene::elasticsearch::elasticsearch_plugin>(std::allocator<graphene::elasticsearch::elasticsearch_plugin>&, graphene::elasticsearch::elasticsearch_plugin*) (alloc_traits.h:475)
   by 0x223D05D: std::_Sp_counted_ptr_inplace<graphene::elasticsearch::elasticsearch_plugin, std::allocator<graphene::elasticsearch::elasticsearch_plugin>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<>(std::allocator<graphene::elasticsearch::elasticsearch_plugin>) (shared_ptr_base.h:526)
   by 0x223B08C: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<graphene::elasticsearch::elasticsearch_plugin, std::allocator<graphene::elasticsearch::elasticsearch_plugin>>(std::_Sp_make_shared_tag, graphene::elasticsearch::elasticsearch_plugin*, std::allocator<graphene::elasticsearch::elasticsearch_plugin> const&) (shared_ptr_base.h:637)
   by 0x2239266: std::__shared_ptr<graphene::elasticsearch::elasticsearch_plugin, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<graphene::elasticsearch::elasticsearch_plugin>>(std::_Sp_make_shared_tag, std::allocator<graphene::elasticsearch::elasticsearch_plugin> const&) (shared_ptr_base.h:1295)
   by 0x22374B5: std::shared_ptr<graphene::elasticsearch::elasticsearch_plugin>::shared_ptr<std::allocator<graphene::elasticsearch::elasticsearch_plugin>>(std::_Sp_make_shared_tag, std::allocator<graphene::elasticsearch::elasticsearch_plugin> const&) (shared_ptr.h:344)
   by 0x2234EEB: std::shared_ptr<graphene::elasticsearch::elasticsearch_plugin> std::allocate_shared<graphene::elasticsearch::elasticsearch_plugin, std::allocator<graphene::elasticsearch::elasticsearch_plugin>>(std::allocator<graphene::elasticsearch::elasticsearch_plugin> const&) (shared_ptr.h:691)

The proposed proper cleanup shrinks the number of the reported memory leaks, yet some of them remain.

crypto-ape · 2019-04-02T14:55:47Z

I will comment the commits in bitshares/bitshares-fc#118 later, probably tomorrow.

ryanRfox · 2019-04-02T20:47:18Z

Does this seem like it can make it into 3.1.0? If yes, please add to the Project Board and Milestone, else to the Backlog. I don't want to loose track of it.

oxarbitrage · 2019-04-02T21:04:17Z

I will comment the commits in bitshares/bitshares-fc#118 later, probably tomorrow.

Thank you.

Does this seem like it can make it into 3.1.0? If yes, please add to the Project Board and Milestone, else to the Backlog. I don't want to loose track of it.

I think is possible, Added.

pmconrad · 2019-04-03T09:08:45Z

libraries/net/node.cpp

@@ -4919,7 +4920,7 @@ namespace graphene { namespace net { namespace detail {
        return _node_delegate->method_name(__VA_ARGS__); \
      } \
      else \
-        return _thread->async([&](){ \
+        return _thread->async([&, statistics_collector](){ \


Please remove the & and capture this instead. Same below.

I am sorry, but this won't work.

Strictly speaking, it would be enough to capture by copy only this->_node_delegate. But omitting the implicit by-reference capture (i.e. &) the __VA_ARGS__ won't be captured.

Well, the simplest solution is to use the implicit capture by copy (i.e. =), which won't compile because the functions used with the macro consume parameters as references (see the interface class node_delegate), at least one uses such parameter as an additional output parameter.

Furthermore, it is not easily seen that the original caller provides these references as of non-stack allocated entities. Thus, to be completely bug-proof, all of these parameters shall be passed in shared_ptr.

Now, from the valgrind output I have produced, it does not seem this is the case. I suppose the proposed modification to be accepted as is (it provably solves an existing problem), and in the case a strictly complete solution is deemed required, a specific issue to be opened for the required refactor.

Missed the VA_ARGS, sorry.

handle_message, for example, uses a stack-allocated message (originating from message_oriented_connection_impl::read_loop). Normally this lives longer than the handle_message call should take, however, it is re-used when reading the next message (which could lead to data corruption), and it is destroyed when reading fails (e. g. when the connection breaks down). This could even be related to #1256 @jmjatlanta ?

I agree that this should be handled in another ticket.

pmconrad · 2019-04-03T09:10:15Z

Nice catch,thanks!

crypto-ape added 3 commits April 2, 2019 12:55

fix referencing local stack variable in async thread

c8ce31a

explicitly cleanup external library facilities

376f031

fix referencing local stack variable in async thread // add missing p…

53af67e

…arameter type change

fix referencing local stack variable in async thread // capture share…

3ab22f6

…d_ptr by copy

oxarbitrage added this to the 3.1.0 - Feature Release milestone Apr 2, 2019

pmconrad reviewed Apr 3, 2019

View reviewed changes

pmconrad approved these changes Apr 4, 2019

View reviewed changes

pmconrad merged commit 421a2dd into bitshares:develop Apr 10, 2019

MichelSantos mentioned this pull request May 1, 2019

Release Notes: BitShares Core 3.1.0 #1745

Closed

ryanRfox mentioned this pull request Nov 18, 2019

Code Clean Up (Ape) #2055

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

valgrind inspired fixes #1693

valgrind inspired fixes #1693

crypto-ape commented Apr 2, 2019

oxarbitrage commented Apr 2, 2019

oxarbitrage commented Apr 2, 2019

crypto-ape commented Apr 2, 2019 •

edited

Loading

crypto-ape commented Apr 2, 2019

ryanRfox commented Apr 2, 2019

oxarbitrage commented Apr 2, 2019

pmconrad Apr 3, 2019

crypto-ape Apr 3, 2019

pmconrad Apr 4, 2019

pmconrad commented Apr 3, 2019

valgrind inspired fixes #1693

valgrind inspired fixes #1693

Conversation

crypto-ape commented Apr 2, 2019

oxarbitrage commented Apr 2, 2019

oxarbitrage commented Apr 2, 2019

crypto-ape commented Apr 2, 2019 • edited Loading

fix referencing local stack variable in async thread

explicitly cleanup external library facilities

crypto-ape commented Apr 2, 2019

ryanRfox commented Apr 2, 2019

oxarbitrage commented Apr 2, 2019

pmconrad Apr 3, 2019

Choose a reason for hiding this comment

crypto-ape Apr 3, 2019

Choose a reason for hiding this comment

pmconrad Apr 4, 2019

Choose a reason for hiding this comment

pmconrad commented Apr 3, 2019

crypto-ape commented Apr 2, 2019 •

edited

Loading