Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add improved flatbuffer building API #6

Closed
jkuszmaul opened this issue Mar 13, 2022 · 1 comment
Closed

Add improved flatbuffer building API #6

jkuszmaul opened this issue Mar 13, 2022 · 1 comment

Comments

@jkuszmaul
Copy link
Contributor

The current API is cumbersome, confusing, and prone to human error.

@bsilver8192
Copy link
Contributor

Oops, I opened #8 for this halfway through reading my notifications (before seeing this). Lots more detail over there.

frc971-automation pushed a commit that referenced this issue Aug 17, 2023
This patch came out of trying to make `message_bridge_test` less
flaky. The test currently fails occasionally without an error message.
It just stops.

When run under `gdb`, we see the following:

    ...
    [New Thread 0x7fffeda496c0 (LWP 59)]
    I20230806 21:23:50.636907    59 message_bridge_test_lib.cc:28] Started pong
    [New Thread 0x7fffed3e06c0 (LWP 60)]
    I20230806 21:23:50.648432    60 message_bridge_test_lib.cc:28] Started ping
    [New Thread 0x7fffecd776c0 (LWP 61)]
    I20230806 21:23:50.658921    61 message_bridge_test_lib.cc:28] Started pi1_message_bridge_server
    I20230806 21:23:50.702045    61 shm_event_loop.cc:1026] Setting priority to 1
    [New Thread 0x7fffec70e6c0 (LWP 62)]
    I20230806 21:23:50.705500    62 message_bridge_test_lib.cc:28] Started pi1_message_bridge_client
    I20230806 21:23:50.718152    62 shm_event_loop.cc:1026] Setting priority to 1
    [New Thread 0x7fffec0a56c0 (LWP 63)]
    I20230806 21:23:50.724790    63 message_bridge_test_lib.cc:28] Started pi2_message_bridge_client
    I20230806 21:23:50.735529    63 shm_event_loop.cc:1026] Setting priority to 1
    [New Thread 0x7fffeba3c6c0 (LWP 64)]
    I20230806 21:23:50.739696    64 message_bridge_test_lib.cc:28] Started pi2_message_bridge_server
    I20230806 21:23:50.751094    64 shm_event_loop.cc:1026] Setting priority to 1

    Thread 3 "ping" received signal SIG36, Real-time event 36.
    [Switching to Thread 0x7fffed3e06c0 (LWP 60)]
    __GI___pthread_sigmask (how=1, newmask=<optimized out>,
        oldmask=0x7fffed3de2f8) at ./nptl/pthread_sigmask.c:43
    43      ./nptl/pthread_sigmask.c: No such file or directory.
    #0  __GI___pthread_sigmask (how=1, newmask=<optimized out>,
        oldmask=0x7fffed3de2f8) at ./nptl/pthread_sigmask.c:43
    #1  0x00007ffff530c503 in aos::ipc_lib::(anonymous namespace)::wrapped_pthread_sigmask (how=1, set=0xb2cf38, oldset=0x7fffed3de2f8)
        at aos/ipc_lib/signalfd.cc:65
    #2  0x00007ffff530c558 in aos::ipc_lib::SignalFd::~SignalFd (this=0xb2cf30)
        at aos/ipc_lib/signalfd.cc:113
    #3  0x00007ffff58b2c9b in std::default_delete<aos::ipc_lib::SignalFd>::operator() (this=0x7fffffffd4e0, __ptr=0xb2cf30)
        at external/amd64_debian_sysroot//usr/include/c++/10/bits/unique_ptr.h:85
    #4  0x00007ffff58b473c in std::__uniq_ptr_impl<aos::ipc_lib::SignalFd, std::default_delete<aos::ipc_lib::SignalFd> >::reset (this=0x7fffffffd4e0, __p=0x0)
        at external/amd64_debian_sysroot//usr/include/c++/10/bits/unique_ptr.h:182
    #5  0x00007ffff58a535d in std::unique_ptr<aos::ipc_lib::SignalFd, std::default_delete<aos::ipc_lib::SignalFd> >::reset (this=0x7fffffffd4e0, __p=0x0)
        at external/amd64_debian_sysroot//usr/include/c++/10/bits/unique_ptr.h:456
    #6  0x00007ffff58a213e in aos::ShmEventLoop::Run (this=0x7fffffffd200)
        at aos/events/shm_event_loop.cc:1063
    #7  0x00007ffff7eb3980 in aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0::operator()() const (
        this=0xaef998) at aos/network/message_bridge_test_lib.cc:30
    #8  0x00007ffff7eb382d in std::__invoke_impl<void, aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0>(std::__invoke_other, aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0&&) (__f=...)
        at external/amd64_debian_sysroot//usr/include/c++/10/bits/invoke.h:60
    #9  0x00007ffff7eb37bd in std::__invoke<aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0>(aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0&&) (__fn=...)
        at external/amd64_debian_sysroot//usr/include/c++/10/bits/invoke.h:95
    #10 0x00007ffff7eb3795 in std::thread::_Invoker<std::tuple<aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0xaef998)
        at external/amd64_debian_sysroot//usr/include/c++/10/thread:264
    #11 0x00007ffff7eb3765 in std::thread::_Invoker<std::tuple<aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0> >::operator()() (this=0xaef998)
        at external/amd64_debian_sysroot//usr/include/c++/10/thread:271
    #12 0x00007ffff7eb36c9 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0> > >::_M_run() (this=0xaef990)
        at external/amd64_debian_sysroot//usr/include/c++/10/thread:215
    #13 0x00007ffff7cd44a3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
    #14 0x00007ffff48a8044 in start_thread (arg=<optimized out>)
        at ./nptl/pthread_create.c:442
    #15 0x00007ffff49285fc in clone3 ()
        at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Sometimes, as soon as the `ping` thread starts exiting, a wakeup
signal arrives. That causes the default signal handler to be called.
Which for SIGRT2 is to terminate the program.

For most applications that's probably fine in the grand scheme of
things, but it's a bit annoying for tests. Especially for
`message_bridge_test` which has multiple threads with watchers.

I am not 100% convinced that the issue is happening because of a race
condition in `lockless_queue.cc`, but it seems the most likely
candidate. Quoted from the `LocklessQueueWakeUpper::Wakeup` function:

    // Grab a copy so it won't change out from underneath us, and we can sort it
    // nicely in C++.
    // Do note that there is still a window where the process can die *after* we
    // read everything.  We will still PI boost and send a signal to the thread in
    // question.  There is no way without pidfd's to close this window, and
    // creating a pidfd is likely not RT.

The way I'm reading this is that there's a window between the sender
grabbing a copy of the watchers and the watcher unblocking the signal.
That window allows for the receiver to be killed before it's finished
tearing down.

Rough sequence of events:

1. The sender reads the list of watchers from SHMEM.
2. The reader shuts down and unblocks the wakeup signal.
3. The sender sends the wakeup signal to all watchers.
4. The reader invokes the default signal handler for the wakeup
   signal. By default this causes the process to exit uncleanly
   immediately.

Before this patch, I saw 20-30 tests out of 1000 fail due to this
error. After this patch, there are no failures of this kind.

Change-Id: I12a18e2977b50b9cb33334373e131ae9774e7579
Signed-off-by: Philipp Schrader <[email protected]>
frc971-automation pushed a commit that referenced this issue Feb 17, 2024
We're currently seeing the following undefined behaviour in the code
that converts `realtime_clock::time_point` values into human-readable
strings. The issue is that the computation overflows what we can
represent in 64 bits when stringifying `realtime_clock::min_time`.

    $ bazel test --config=ubsan @aos//aos/events/logging:logfile_utils_test --test_filter=PartsMessageReaderDeathTest.TooFarOutOfOrder --test_output=streamed   --test_env=UBSAN_SYMBOLIZER_PATH="$(readlink -f external/llvm_k8/bin/llvm-symbolizer)"
    ...
    [ RUN      ] PartsMessageReaderDeathTest.TooFarOutOfOrder
    external/aos/aos/events/logging/logfile_utils_test.cc:169: Failure
    Death test: { reader.ReadMessage(); }
        Result: died but not with expected error.
      Expected: contains regular expression "-0.000000001sec vs. 0.000000000sec"
    Actual msg:
    [  DEATH   ] external/llvm_k8/include/c++/v1/__chrono/duration.h:102:59: runtime error: signed integer overflow: -9223372037 * 1000000000 cannot be represented in type '_Ct' (aka 'long long')
    ...
    [  DEATH   ]     #4 0x7f6788f54d9a in aos::operator<<(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, std::__1::chrono::time_point<aos::realtime_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) external/aos/aos/time/time.cc:189:55
    [  DEATH   ]     #5 0x7f67938da571 in aos::logger::operator<<(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, aos::logger::LogParts const&) external/aos/aos/events/logging/logfile_sorting.cc:2163:50
    [  DEATH   ]     #6 0x7f6793aeec53 in aos::logger::PartsMessageReader::ReadMessage() external/aos/aos/events/logging/logfile_utils.cc:1679:32
    [  DEATH   ]     #7 0x55ed45105cb0 in aos::logger::testing::PartsMessageReaderDeathTest_TooFarOutOfOrder_Test::TestBody() external/aos/aos/events/logging/logfile_utils_test.cc:169:3
    ...
    [  DEATH   ]
    [  DEATH   ] SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior external/llvm_k8/include/c++/v1/__chrono/duration.h:102:59 in
    [  DEATH   ]
    [  FAILED  ] PartsMessageReaderDeathTest.TooFarOutOfOrder (200 ms)
    [----------] 1 test from PartsMessageReaderDeathTest (200 ms total)

This patch fixes the issue by detecting potential overflow and return
a mostly non-human-readable string version of the time stamp.

Change-Id: I4bff6272ac6f4d48c646face2c13fbc7dd3bb0ed
Signed-off-by: James Kuszmaul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants