Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valgrind inspired fixes #1693

Merged
merged 4 commits into from
Apr 10, 2019
Merged

Conversation

crypto-ape
Copy link
Contributor

Hey Monkeys!

I had some troubles and run the witness_node with valgrind (the memcheck tool). As a result, I provide here some minor fixes.

Ape out!

@oxarbitrage
Copy link
Member

very nice, i wanted to run valgrind a while back and was unsuccessful. care to explain a bit the errors you were having and how the changes fixes them. thanks.

@oxarbitrage
Copy link
Member

Related bitshares/bitshares-fc#118

Some valgrind output added into this pull requests will help. I also think we might be able to get some better or at least different results profiling than with the gprof.

A wiki on how you are doing this will be very helpful IMHO.

@crypto-ape
Copy link
Contributor Author

crypto-ape commented Apr 2, 2019

fix referencing local stack variable in async thread

Valgrind reported both invalid reads and invalid writes with traces like this:

Invalid write of size 8
   at 0x3267A14: graphene::net::detail::statistics_gathering_node_delegate_wrapper::call_statistics_collector::starting_execution() (node_impl.hxx:133)
   by 0x32670D7: graphene::net::detail::statistics_gathering_node_delegate_wrapper::call_statistics_collector::actual_execution_measurement_helper::actual_execution_measurement_helper(graphene::net::detail::statistics_gathering_node_delegate_wrapper::call_statistics_collector&) (node_impl.hxx:92)
   by 0x324E5F9: graphene::net::detail::statistics_gathering_node_delegate_wrapper::handle_block(graphene::net::block_message const&, bool, std::vector<fc::ripemd160, std::allocator<fc::ripemd160> >&)::{lambda()#1}::operator()() const (node.cpp:4999)
   by 0x32622F8: fc::detail::functor_run<graphene::net::detail::statistics_gathering_node_delegate_wrapper::handle_block(graphene::net::block_message const&, bool, std::vector<fc::ripemd160, std::allocator<fc::ripemd160> >&)::{lambda()#1}>::run(void*, fc::detail::functor_run<graphene::net::detail::statistics_gathering_node_delegate_wrapper::handle_block(graphene::net::block_message const&, bool, std::vector<fc::ripemd160, std::allocator<fc::ripemd160> >&)::{lambda()#1}>) (task.hpp:77)
   by 0x301BC0B: fc::task_base::run_impl() (task.cpp:43)
   by 0x301BB8B: fc::task_base::run() (task.cpp:32)
   by 0x300E85D: fc::thread_d::run_next_task() (thread_d.hpp:513)
   by 0x300EBC2: fc::thread_d::process_tasks() (thread_d.hpp:562)
   by 0x300E30A: fc::thread_d::start_process_tasks(boost::context::detail::transfer_t) (thread_d.hpp:493)
   by 0x33CB9AE: make_fcontext (in /home/crypto-ape/work/valgrind/witness_node)
 Address 0x2b6447e8 is 2,095,016 bytes inside a block of size 2,097,152 alloc'd
   at 0x5022B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

The original code created a local variable on stack and deferred a lambda function reference-capturing the local variable into an asynchronous thread. At the time the lambda function was called, it used already invalidated memory. What is worse, it could rewrite a stack in use.

I have put the object into a shared_ptr and capture it in the lambda function by copy. After the last commit I have tested it again and compared the logs from valgrind. Definitely fixed.

explicitly cleanup external library facilities

Various memory leaks are detected with malloc being called from curl_easy_init. One example:

256 bytes in 1 blocks are indirectly lost in loss record 41 of 67
   at 0x5022B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5879171: ??? (in /usr/lib/x86_64-linux-gnu/libcurl.so.4.5.0)
   by 0x5886143: curl_easy_init (in /usr/lib/x86_64-linux-gnu/libcurl.so.4.5.0)
   by 0x2A5699A: graphene::elasticsearch::detail::elasticsearch_plugin_impl::elasticsearch_plugin_impl(graphene::elasticsearch::elasticsearch_plugin&) (elasticsearch_plugin.cpp:42)
   by 0x2A51150: graphene::elasticsearch::elasticsearch_plugin::elasticsearch_plugin() (elasticsearch_plugin.cpp:401)
   by 0x223F9E7: void __gnu_cxx::new_allocator<graphene::elasticsearch::elasticsearch_plugin>::construct<graphene::elasticsearch::elasticsearch_plugin>(graphene::elasticsearch::elasticsearch_plugin*) (new_allocator.h:136)
   by 0x223E87A: void std::allocator_traits<std::allocator<graphene::elasticsearch::elasticsearch_plugin> >::construct<graphene::elasticsearch::elasticsearch_plugin>(std::allocator<graphene::elasticsearch::elasticsearch_plugin>&, graphene::elasticsearch::elasticsearch_plugin*) (alloc_traits.h:475)
   by 0x223D05D: std::_Sp_counted_ptr_inplace<graphene::elasticsearch::elasticsearch_plugin, std::allocator<graphene::elasticsearch::elasticsearch_plugin>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<>(std::allocator<graphene::elasticsearch::elasticsearch_plugin>) (shared_ptr_base.h:526)
   by 0x223B08C: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<graphene::elasticsearch::elasticsearch_plugin, std::allocator<graphene::elasticsearch::elasticsearch_plugin>>(std::_Sp_make_shared_tag, graphene::elasticsearch::elasticsearch_plugin*, std::allocator<graphene::elasticsearch::elasticsearch_plugin> const&) (shared_ptr_base.h:637)
   by 0x2239266: std::__shared_ptr<graphene::elasticsearch::elasticsearch_plugin, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<graphene::elasticsearch::elasticsearch_plugin>>(std::_Sp_make_shared_tag, std::allocator<graphene::elasticsearch::elasticsearch_plugin> const&) (shared_ptr_base.h:1295)
   by 0x22374B5: std::shared_ptr<graphene::elasticsearch::elasticsearch_plugin>::shared_ptr<std::allocator<graphene::elasticsearch::elasticsearch_plugin>>(std::_Sp_make_shared_tag, std::allocator<graphene::elasticsearch::elasticsearch_plugin> const&) (shared_ptr.h:344)
   by 0x2234EEB: std::shared_ptr<graphene::elasticsearch::elasticsearch_plugin> std::allocate_shared<graphene::elasticsearch::elasticsearch_plugin, std::allocator<graphene::elasticsearch::elasticsearch_plugin>>(std::allocator<graphene::elasticsearch::elasticsearch_plugin> const&) (shared_ptr.h:691)

The proposed proper cleanup shrinks the number of the reported memory leaks, yet some of them remain.

@crypto-ape
Copy link
Contributor Author

I will comment the commits in bitshares/bitshares-fc#118 later, probably tomorrow.

@ryanRfox
Copy link
Contributor

ryanRfox commented Apr 2, 2019

Does this seem like it can make it into 3.1.0? If yes, please add to the Project Board and Milestone, else to the Backlog. I don't want to loose track of it.

@oxarbitrage oxarbitrage added this to the 3.1.0 - Feature Release milestone Apr 2, 2019
@oxarbitrage
Copy link
Member

I will comment the commits in bitshares/bitshares-fc#118 later, probably tomorrow.

Thank you.

Does this seem like it can make it into 3.1.0? If yes, please add to the Project Board and Milestone, else to the Backlog. I don't want to loose track of it.

I think is possible, Added.

@@ -4919,7 +4920,7 @@ namespace graphene { namespace net { namespace detail {
return _node_delegate->method_name(__VA_ARGS__); \
} \
else \
return _thread->async([&](){ \
return _thread->async([&, statistics_collector](){ \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the & and capture this instead. Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, but this won't work.

Strictly speaking, it would be enough to capture by copy only this->_node_delegate. But omitting the implicit by-reference capture (i.e. &) the __VA_ARGS__ won't be captured.

Well, the simplest solution is to use the implicit capture by copy (i.e. =), which won't compile because the functions used with the macro consume parameters as references (see the interface class node_delegate), at least one uses such parameter as an additional output parameter.

Furthermore, it is not easily seen that the original caller provides these references as of non-stack allocated entities. Thus, to be completely bug-proof, all of these parameters shall be passed in shared_ptr.

Now, from the valgrind output I have produced, it does not seem this is the case. I suppose the proposed modification to be accepted as is (it provably solves an existing problem), and in the case a strictly complete solution is deemed required, a specific issue to be opened for the required refactor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed the VA_ARGS, sorry.

handle_message, for example, uses a stack-allocated message (originating from message_oriented_connection_impl::read_loop). Normally this lives longer than the handle_message call should take, however, it is re-used when reading the next message (which could lead to data corruption), and it is destroyed when reading fails (e. g. when the connection breaks down). This could even be related to #1256 @jmjatlanta ?

I agree that this should be handled in another ticket.

@pmconrad
Copy link
Contributor

pmconrad commented Apr 3, 2019

Nice catch,thanks!

@pmconrad pmconrad merged commit 421a2dd into bitshares:develop Apr 10, 2019
@ryanRfox ryanRfox mentioned this pull request Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants