Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash from manual runtime scheduling of Received CAM messages #181

Open
sivasr1 opened this issue Jan 27, 2023 · 5 comments
Open

Crash from manual runtime scheduling of Received CAM messages #181

sivasr1 opened this issue Jan 27, 2023 · 5 comments

Comments

@sivasr1
Copy link

sivasr1 commented Jan 27, 2023

Hello everyone,

While we are trying to reproduce the socktap binary with CAM application for DEMO purpose we always get a crash over time from
the function "ManualRuntime::trigger(Clock::time_point tp)" defined in "vanetza/common/manual_runtime.cpp"
This crash (mainly on reception side and rarely on transmission side) happens only after successful processing of many received CAM messages.

From the debugging the main reason for this crash seems to be because of the simultaneous calls to function
"void TimeTrigger::schedule()" defined in "tools/socktap/time_trigger.cpp"

NOTE :: The crash issue is always reproducible , but crash happens quicker if the rate of transmitted messages is high ( example :: 100 milliseconds/1 CAM message). So basically receiving 10 messages in a second runs about 40-60 minutes without any issues but then this crash happens. Once we reduce the rate of the receival of the messages from 10 to 1 in a second then actually our system runs for multiple hours before having the same crash.

Please refer commands below for more information related to GDB back trace and socktap commands usage.

Core issue::

socktap: vanetza/common/manual_runtime.cpp:62: void vanetza::ManualRuntime::trigger(vanetza::Clock::time_point): Assertion `tp >= m_now' failed.
8772 64
Aborted (core dumped)

On Transmission side::

./socktap -a ca -p static --latitude 40.8810 --longitude 29.3433 --pos_confidence 5 --print-tx-cam --cam-interval 100 -i lo --mac-address 00:00:00:00:00:11

On Reception Side::

./socktap -a ca -p static --latitude 40.8810 --longitude 29.3433 --pos_confidence 5 --print-rx-cam -i lo --mac-address 00:00:00:00:00:22

GDB backtrace::

> (gdb) bt
 #0  0x0000ffff94f789a8 in raise () from /lib/libc.so.6
 #1  0x0000ffff94f65f54 in abort () from /lib/libc.so.6
 #2  0x0000ffff94f72148 in ?? () from /lib/libc.so.6
 #3  0x0000ffff94f721ac in __assert_fail () from /lib/libc.so.6
 #4  0x0000aaaac557a868 in vanetza::ManualRuntime::trigger (this=this@entry=0xffffee3a61e0, tp=...)
     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/c++/9.2.0/bits/locale_facets.h:877
> #5  0x0000aaaac54cee64 in TimeTrigger::update_runtime (this=this@entry=0xffffee3a6160)
>     at tools/socktap/time_trigger.cpp:48
> #6  0x0000aaaac54cef48 in TimeTrigger::schedule (this=0xffffee3a6160) at tools/socktap/time_trigger.cpp:25
> #7  0x0000aaaac54d2098 in std::__invoke_impl<void, void (TimeTrigger::*&)(boost::system::error_code const&), TimeTrigger*&, boost::system::error_code const&> (__f=<optimized out>, __f=<optimized out>, __t=<optimized out>)
>     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/c++/9.2.0/bits/invoke.h:89
> #8  std::__invoke<void (TimeTrigger::*&)(boost::system::error_code const&), TimeTrigger*&, boost::system::error_code const&> (__fn=
>     @0xffffee3a5bd0: (void (TimeTrigger::*)(TimeTrigger * const, const boost::system::error_code &)) 0xaaaac54cf1d0 <TimeTrigger::on_timeout(boost::system::error_code const&)>) at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/c++/9.2.0/bits/invoke.h:95
> #9  std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>::__call<void, boost::system::error_code const&, 0ul, 1ul>(std::tuple<boost::system::error_code const&>&&, std::_Index_tuple<0ul, 1ul>) (__args=..., this=0xffffee3a5bd0)
>     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/c++/9.2.0/functional:400
> #10 std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>::operator()<boost::system::error_code const&, void>(boost::system::error_code const&) (this=0xffffee3a5bd0)
>     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/c++/9.2.0/functional:484
> #11 boost::asio::detail::binder1<std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>, boost::system::error_code>::operator()() (this=0xffffee3a5bd0)
>     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/boost/asio/detail/bind_handler.hpp:171
> #12 boost::asio::asio_handler_invoke<boost::asio::detail::binder1<std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>, boost::system::error_code> >(boost::asio::detail::binder1<std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>, boost::system::error_code>&, ...) (function=...)
>     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/boost/asio/handler_invoke_hook.hpp:88
> #13 boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::binder1<std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>, boost::system::error_code>, std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)> >(boost::asio::detail::binder1<std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>, boost::system::error_code>&, std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>&) (context=..., function=...)
>     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/boost/asio/detail/handler_invoke_helpers.hpp:54
> #14 boost::asio::detail::handler_work<std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>,
>  boost::asio::any_io_executor, void>::complete<boost::asio::detail::binder1<std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>, boost::system::error_code> >(boost::asio::detail::binder1<std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>, boost::system::error_code>&, std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>&) (handler=..., function=..., this=0xffffee3a5c30)
>     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/boost/asio/detail/handler_work.hpp:520
> #15 boost::asio::detail::wait_handler<std::_Bind<void (TimeTrigger::*(TimeTrigger*, std::_Placeholder<1>))(boost::system::error_code const&)>, boost::asio::any_io_executor>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) (owner=0xaaaae0940320, base=<optimized out>)
>     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/boost/asio/detail/wait_handler.hpp:76
> #16 0x0000aaaac549b08c in boost::asio::detail::scheduler_operation::complete (bytes_transferred=<optimized out>, ec=..., 
>     owner=0xaaaae0940320, this=<optimized out>)
>     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/boost/asio/detail/scheduler_operation.hpp:40
> #17  boost::asio::detail::scheduler::do_run_one (ec=..., this_thread=..., lock=..., this=0xaaaae0940320)
>     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/boost/asio/detail/impl/scheduler.ipp:492
> #18 boost::asio::detail::scheduler::run (this=0xaaaae0940320, ec=...)
>     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/boost/asio/detail/impl/scheduler.ipp:210
> #19 0x0000aaaac549041c in boost::asio::io_context::run (this=0xffffee3a5ec8)
>     at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/boost/asio/impl/io_context.ipp:63
> #20 main (argc=<optimized out>, argv=<optimized out>) at tools/socktap/main.cpp:205
> (gdb)

Thanks
Siva

@riebl
Copy link
Owner

riebl commented Feb 2, 2023

Hi @sivasr1,

can you please elaborate on your changes to the socktap code? You mention "simultaneous calls" to TimeTrigger::schedule, but originally socktap is a purely single-threaded application.
What are the values of tp and m_now when the assertion fails?

@sivasr1
Copy link
Author

sivasr1 commented Feb 3, 2023

Hi Reibl,

Thanks for your reply. we made some changes in link layer part (raw_socket_link.cpp , where a new thread is instantiated which calls function "do_receive") to use quectel service. I have attached a couple of documents with the logs( includes current and previous time stamps at which functions are called , and start of functions ..etc).

Thanks
Siva
old_Rx_log.txt
Crash_Feb03_Rx.txt

@riebl
Copy link
Owner

riebl commented Feb 7, 2023

Okay, multi-threading explains this behaviour: Vanetza is not a thread-safe library, i.e. you can use Vanetza with threads, but you need to add thread synchronization on your own.
In your particular case, a (fatal) race condition occurs because the main thread running socktap (the io_service.run() in main.cpp) and your link layer thread may concurrently update the internal timestamp. Your link layer invokes the indication callback which finally invokes RouterContext::indicate. Thus, the whole RouterContext object and all of its members are prone to concurrent access by two threads – a classic race condition.

Instead of adding mutexes everywhere, I suggest having a look at boost::asio::post. With this feature, you can queue the execution of the indication callback invocation to the main thread where the io_service is executed. Calling boost::asio::post is thread-safe.

@sivasr1
Copy link
Author

sivasr1 commented Feb 21, 2023

Hi Riebl,

Sorry for the late reply , I am a novice to boost::asio , not sure if i understood and modified code correctly,
i modified code as attached , which is leading to a crash.

In short modified as below:

// Get io_context here and execute [io.context.post](http://io.context.post/) with do_receive
auto ex = socket_.get_executor();//boost::asio::get_associated_executor()
boost::asio::io_context * io_context = ex.target<boost::asio::io_context>();
io_context->post(std::bind(&RawSocketLink::do_receive , this )); // boost::bind not working //error: 'bind' is not a member of 'boost'

GDB BackTrace::

Thread 1 "socktap" received signal SIGSEGV, Segmentation fault.
boost::asio::io_context::initiate_post::operator()<std::_Bind<void (RawSocketLink::*(RawSocketLink*))()> >(std::_Bind<void (RawSocketLink::*(RawSocketLink*))()>&&, boost::asio::io_context*) const (self=<optimized out>,
    handler=..., this=<optimized out>)
    at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/c++/9.2.0/bits/move.h:74
74    forward(typename std::remove_reference<_Tp>::type& __t) noexcept
(gdb) bt
#0  boost::asio::io_context::initiate_post::operator()<std::_Bind<void (RawSocketLink::*(RawSocketLink*))()> >(std::_Bind<void (RawSocketLink::*(RawSocketLink*))()>&&, boost::asio::io_context*) const (self=<optimized out>,
    handler=..., this=<optimized out>)
    at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/c++/9.2.0/bits/move.h:74
#1  boost::asio::detail::completion_handler_async_result<std::_Bind<void (RawSocketLink::*(RawSocketLink*))()>, void ()>::initiate<boost::asio::io_context::initiate_post, std::_Bind<void (RawSocketLink::*(RawSocketLink*))()>, boost::asio::io_context*>(boost::asio::io_context::initiate_post&&, std::_Bind<void (RawSocketLink::*(RawSocketLink*))()>&&, boost::asio::io_context*&&) (token=..., initiation=...)
    at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/boost/asio/async_result.hpp:482
#2  boost::asio::async_initiate<std::_Bind<void (RawSocketLink::*(RawSocketLink*))()>, void (), boost::asio::io_context::initiate_post, boost::asio::io_context*>(boost::asio::io_context::initiate_post&&, std::_Bind<void (RawSocketLink::*(RawSocketLink*))()>&, boost::asio::io_context*&&) (token=..., initiation=...)
    at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/boost/asio/async_result.hpp:896
#3  boost::asio::io_context::post<std::_Bind<void (RawSocketLink::*(RawSocketLink*))()> >(std::_Bind<void (RawSocketLink::*(RawSocketLink*))()>&&) (handler=..., this=<optimized out>)
    at /opt/fsl-imx-wayland/5.4-zeus/sysroots/aarch64-poky-linux/usr/include/boost/asio/impl/io_context.hpp:206
#4  RawSocketLink::RawSocketLink (this=0xaaaaaacdee90, socket=...)
    at /home/siva/newWorkSpace/vanetsoar/tools/socktap/raw_socket_link_soar.cpp:168
#5  0x0000aaaaaab6f308 in create_link_layer (io_context=..., device=..., name=...)
    at /home/siva/newWorkSpace/vanetsoar/tools/socktap/link_layer.cpp:27
#6  0x0000aaaaaab578e8 in main (argc=18, argv=<optimized out>)
    at /home/siva/newWorkSpace/vanetsoar/tools/socktap/main.cpp:104
(gdb) 

Could you please check if my implementation of boost::asio::post is correct ? or would you suggest any other better alternative ?
Thanks for your help.

Thanks
Siva

@riebl
Copy link
Owner

riebl commented Feb 24, 2023

It is hard to "debug" your code with little information. You can fetch the Boos.Asio execution context from socket_.get_executor().context() and pass it to boost::asio::post:

auto& ec = socket_.get_executor().context();
boost::asio::post(ec, std::bind(&RawSocketLink::do_receive, this));

If binding do_receive makes sense depends a lot on the rest of your implementation. In the end, callback_ needs to get invoked by the execution context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants