Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boost 1.68 serialization breaks some Bayeux tests #39

Closed
fmauger opened this issue Apr 30, 2019 · 5 comments
Closed

Boost 1.68 serialization breaks some Bayeux tests #39

fmauger opened this issue Apr 30, 2019 · 5 comments

Comments

@fmauger
Copy link
Member

fmauger commented Apr 30, 2019

While exploring #36, one face a serious problem with some of Bayeux test programs which involve Boost based serialization (Ubuntu 18.04, GCC 7.3).
The (de)serialization itself processes correctly after a few fixes due to some change in
the management of XML archives. However, when the programs end, a segfault occurs at post-main scope. This is the list of failed tests:

The following tests FAILED:
	 41 - datatools-test_serialization (SEGFAULT)
	 48 - datatools-test_things_1 (SEGFAULT)
	 49 - datatools-test_things_2 (SEGFAULT)
	 50 - datatools-test_things_3 (SEGFAULT)
	 51 - datatools-test_things (SEGFAULT)
	 89 - datatools-test_backward_things (SEGFAULT)
	237 - geomtools-test_serializable_2 (SEGFAULT)
	238 - geomtools-test_serializable_3 (SEGFAULT)
	328 - mctools-test_simulated_data_1 (SEGFAULT)
Errors while running CTest

Investigating the crash, we have the following stack trace:

41: ===========================================================
41: There was a crash.
41: This is the entire stack trace of all threads:
41: ===========================================================
41: #0  0x00007fc1e442b687 in __GI___waitpid (pid=18715, stat_loc=stat_loc
41: entry=0x7ffc9d601068, options=options
41: entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
41: #1  0x00007fc1e4396067 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149
41: #2  0x00007fc1e2d94f83 in TUnixSystem::StackTrace() () from /scratch/ubuntu18.04/BxInstall/root-6.16.00/lib/root/libCore.so.6.16
41: #3  0x00007fc1e2d97974 in TUnixSystem::DispatchSignals(ESignals) () from /scratch/ubuntu18.04/BxInstall/root-6.16.00/lib/root/libCore.so.6.16
41: #4  <signal handler called>
41: #5  0x00007fc1e49fa462 in std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
41: #6  0x00007fc1e4ef63e7 in boost::archive::detail::basic_serializer_map::erase(boost::archive::detail::basic_serializer const*) () from /scratch/ubuntu18.04/BxInstall/boost-1.68.0/lib/libboost_serialization.so.1.68.0
41: #7  0x00007fc1e67cefcc in boost::serialization::singleton<boost::archive::detail::pointer_iserializer<boost::archive::text_iarchive, mctools::signal::base_signal> >::get_instance()::singleton_wrapper::~singleton_wrapper() () from /home/mauger/Documents/Private/Software/BxCppDev/Bayeux/Bayeux.git/_build.d/develop_b168/BuildProducts/lib/libBayeux.so.3
41: #8  0x00007fc1e438a615 in __cxa_finalize (d=0x7fc1e6f2b720) at cxa_finalize.c:83
41: #9  0x00007fc1e5cc8e33 in __do_global_dtors_aux () from /home/mauger/Documents/Private/Software/BxCppDev/Bayeux/Bayeux.git/_build.d/develop_b168/BuildProducts/lib/libBayeux.so.3
41: #10 0x00007ffc9d603cd0 in ?? ()
41: #11 0x00007fc1e6f66b73 in _dl_fini () at dl-fini.c:138

Other tests give similar output.
This looks a problem with the order of destruction of static objects provided by the library.

@fmauger fmauger changed the title Boost 1.68 serialization breaks some Bayeux test Boost 1.68 serialization breaks some Bayeux tests Apr 30, 2019
@fmauger
Copy link
Member Author

fmauger commented Apr 30, 2019

Testing Boost 1.69, the problem vanishes.
I wander if this issue is related to boostorg/serialization#131.

@fmauger
Copy link
Member Author

fmauger commented May 2, 2019

Also note that PR #37 does not fix the serialization issue because the multiarchive mode gets broken for XML archive with the seekg technique.

@fmauger
Copy link
Member Author

fmauger commented May 2, 2019

So the best thing to do is to make a try with various archs and compilers to make sure Boost 1.69 is ok.
For now, I chose to break Bayeux at configure step when Boost 1.68 is detected.

@fmauger
Copy link
Member Author

fmauger commented May 21, 2019

I can also reproduce the bug with Boost 1.65.1 and gcc 7.3 on ubuntu 18.04.
Does it mean that it was here from a while but not revealed so far?
Pretty sure this is a problem with invalid order while invoking the destructors of some possibly nested static singletons. This breaks the rule of the order of destruction of static objects from a single binary unit.
I observe this rather arbitrarily with Bayeux test programs with a single executable linked to libBayeux.so.
However I cannot prove that we have only one unit. Maybe there is subtle effects with the executable code
and the shared lib.
Note that changing gcc 7.3 to gcc 6.5 does not change the issue but when Bayeux is build with Boost 1.69, the problem disapears.

@fmauger
Copy link
Member Author

fmauger commented May 24, 2019

It seems the cause has been identified in boostorg/serialization#104
and fixed in boostorg/serialization#131.

After many tests with several versions of Boost: 1.63,1.65.1 (default on Ubuntu 18.04), 1.68 and finally 1.69,
I understand that Boost versions >1.64 and < 1.69 (with Linux+GCC) have a broken singleton implementation with respect to order of calling static objects' destructors from shared libs using Boost/Ser with GCC under Linux. I used GCC 6.5 and 7.3 and reproduced the same crash at program termination
as expected and described by experts.

Passing "-Bsymbolic -Bsymbolic-functions" to the linker should fix the Boost/Serialization crash but I did not test it.

So I consider that we should not use Boost 1.65 to 1.68 for Bayeux and bump directly to 1.69 which seems to solve the problem as mentioned in the discussions in boostorg/serialization#104 and boostorg/serialization#131.

Of course, the scope of this issue has no effect on the consistency of the data serialized through the Bayeux I/O tools.

@fmauger fmauger closed this as completed May 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant