-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Singleton destruction check completely broken leading to memory leaks or crashs #104
Comments
And to echo it here a reproducer in text:
The problem is very complicated to reproduce, due to unknown init/finalize order. I assume: So one problem is the usage of the |
Testcase to demonstrate this: https://gist.github.com/Flamefire/80af5fcdc0f2a6787858738ce07b25bf Run with Boost 1.64 -> 1.67:
As you can easily see 1.64 is working, 1.65.0 introduced the bug that |
Some more analysis: The problem is with the latter as |
After some more analysis: I wonder what really happened.
Apparently (maybe a bug in the compiler?) To summarize: A crash/valgrind detected memory error can occur, IF the compiler destroys Local Static function member is explained here: https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2. However I did not find anything about dtor order for this case. |
According to Microsoft (https://msdn.microsoft.com/en-us/library/xwec471e.aspx) msvc creates and destroys static objects in the same order they are in the code. In my experience, this means that it deppends of the order files in cl.exe. |
This does not apply here. We don't have static objects, we have member function local static objects. This is different. According to this (https://stackoverflow.com/questions/335369/finding-c-static-initialization-order-problems#335746) it should work. IsoCPP FAQ does mention this too: https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2 However I did not find anything about dtor order for this case. |
The problem occurs when using multiple shared libraries. It seems like this messes up the destruction order. Reproduction: https://gist.github.com/Flamefire/286e9e0e501731a04f10786450d3e711 Some debug info by hooking the ctor/dtor of the singletons:
Note how the 2 maps are destructed together which leads to the wrong ordering for the 2nd pair. |
OK - I'm working on this now. I'm presuming that https://gist.github.com/Flamefire/286e9e0e501731a04f10786450d3e711 contains the definitive test case. I can see that it was a bitch to figure this out. I would like to figure out a way to included in the test suite - but there doesn't seem to be a way to do this. Making two DLLS etc. ... |
we have two test cases here. Are they both useful? Or is it only the second one? |
I found a way to include it in the test suite and opened PR #105 for that and the fix. The 2 test cases test something different: The 2nd (https://gist.github.com/Flamefire/286e9e0e501731a04f10786450d3e711) tests the library from a higher level (just minimal code to break it) The other one tests the implementation of the singleton thoroughly (e.g. checking that they are in fact destructed and set the flag) |
I suffer from this issue recently. My code got memory leaks. Is there a quick fix for this problem. I am using vs2017 + boost 1.66. |
You can test my PR: #105 Either as a whole or just the fix (last 2 commits). Note that you can generate patches to download by appending ".diff" to the url on GitHub: https://github.com/boostorg/serialization/commit/17c952b7634b9c0ab8f257c679451587b5c7f280.diff and https://github.com/boostorg/serialization/commit/d63b8535f5e615c1edc8ab50078508e4d56816cf.diff |
FWIW, the reason why |
Small addendum to previous comment: |
FYI - I spent a little time reviewing the test matrix for test_dll_export to try to find some commonality. It seems that failures occur only on linux platforms. Failures seem to be across all combinations of compiler and standard library implementations. I'm still looking at this so my assessment could change. It's a little tedious to do with our tools. |
This matches my experiments with the crash as described here and shown in #111: |
I think I found the culprit: By default, the Linux dynamic linker binds dynamic symbols in a shared object to any already existing symbols in the global namespace. Some |
It's possible to fix the issue by fiddling with |
Changes, cleaned up, and CI results: res2k#10 |
Thanks! Did you also test #111? Would be interesting to know if that fixes it too. And finally: Of course the singleton bug has to be fixed anyway: #110 |
I am running into crashes with boost::serialization::singleton in 1.68; probably related to this issue. Example valgrind dump:
|
Yes this is exactly this issue. You can manually apply #105 to fix that. |
I made further changes related to ELF visibility (still at res2k#10); the test cases from #110 and #111 now run successfully. @robertramey: Your feedback would be greatly appreciated. |
I'm currently bogged down in other stuff I can't set aside. My experience is that once I get into something like this it ends up consuming much more time than one would think. But, I've been following this discussion and appreciate the contributions and efforts of all parties. As I've said before this seems to me something related to unstandardized and differently implemented features of the C/C++ programming surface:dynamic loading and symbol visibility. It's seems you (everyone) is making progress. I think it would be good if these results could be boiled down to a couple of tests which I could add to the permanent test suite. Then we would see the results across the whole test matrix. The test matrix is not as diverse as I would hope and it doesn't make it easy to review results by attributes like 32 vs 64 bit, std version, compiler version, os version, etc. But it's the best we have and it's the best we have in a situation like this. Soooo ... - I would like to see this boiled down to one or two test. Before doing this, please look carefully at the current tests. The are set up so that, if appropriate, the same test can be run with different archive types. The use "lightweight test" macros and follow the same general organization. This makes it easier for me and others to understand. This might not seem important but it is. Keep in mind that the serialization library testing is designed to be a running history of all the failures we've ever found. Once a failure is found, we add it to the permanent test suite so the the effort cannot creep back into the code. It also avoids playing "whack-a-mole" where by fixing problem1 creates problem2 and vice-versa resulting in an infinite "ping-pong". After 15 years, the boost serialization is better than it's ever been and still getting better. This is in large part due to the efforts of all of you and and others. It's much appreciated. |
I already did that: #110 and #111 #105 consists of those 2 PRs and 2 commits fixing the issue. I did it this way to decompose this (admittedly difficult) issue into its pieces so understanding the tests and changes is as easy as possible. |
Correct. This is IMO the only commit that requires a bit discussion as the rest is straight forward. My approach was: "We cannot fix the underlying issue as it is (C++) runtime related" (you seem to have fixed that though, but the amount of additional macros and their different use cases scare me... Maybe something was forgotten?)
So the assert isn't really required. And it makes sense to: What is done in this code is unregistering stuff that the calling class has registered. If the registry is already destroyed then there is no point in trying to unregister something. If those visibility macros work and are accepted, then this commit can be reverted though but the checks below should be kept. This way the assert is only a development aid to catch missing visibility stuff and the runtime check catches those unexpected cases so they don't crash the application. An explanation should also be added to the assert so no one will go and remove the check. |
One thing I'm not so sure about is whether the "pointer check" approach plays well with dynamically loaded SOs using Boost.Serialization. The ELF symbol overriding may foul up such dynamic loading and unloading. (There's a test case, but unfortunately is broken.) Note that "lots of macros" isn't the only approach to reign in ELF symbol overriding. As I tried earlier, the |
I doubt that this can be a problem: We check a value in the same class we are gonna call. Even if something would be mixed up, we'll still get the desired behaviour: Call the class when its valid. |
- A subclass of T is need to correctly track the lifetime of the singleton, so is_destroyed works reliably. - singleton<T> ctor is made protected so it cannot be created accidentally - Existing comments (mostly typos) are fixed - Additional comments are added detailing the usage and design choices made for the singleton to avoid people accidentally breaking it (again) Fixes boostorg#104
- A subclass of T is need to correctly track the lifetime of the singleton, so is_destroyed works reliably. - singleton<T> ctor is made protected so it cannot be created accidentally - Existing comments (mostly typos) are fixed - Additional comments are added detailing the usage and design choices made for the singleton to avoid people accidentally breaking it (again) Fixes boostorg#104
I was wondering if anyone with enough knowledge about this specific issue could tell me if a crash we're seeing on-exit could be this issue. Specifically our game Factorio is getting 'random' crashes in the AMD graphics driver for AMD GPU users when exiting the game. The stack trace is always identical and goes through boost::serialization::singleton before crashing. A symbolized stack trace (to the best of our abilities) can be seen in this report: https://forums.factorio.com/113613 Some information:
|
Cemu users are seeing the same in 24.8.1 (Vulkan) and 24.9.1 (Vulkan and OpenGL) Wonder if AMD has unknowingly upgraded core driver components into a broken state. |
The commit b0a794d introduced a change that completely breaks the singleton implementation.
The problem is:
singleton<T>
is NOT a singleton! It is NEVER instantiated. Instead it instantiates a classsingleton_wrapper
derived from T. That instance is then returned in theget_instance
method.Because it is never instantiated it is also never destroyed. This leads to the
destroyed
flag never set. It only happens to be false because it is default (or value?) initialized.This was made even worse with commit 7d216b4 where the singleton instance is not a static variable with automatic, thread-safe construction/destruction but a pointer which is leaked.
The whole implementation is against the singleton pattern: Currently is is completely possible to create a
singleton<Foo>
instance. Even worse: Currently you would have to do this EXACTLY once to have correct code for EVERYsingleton
. If you don't do this you get a memory leak or corrupted memory depending on the version. If you do this more than once you yet a double-free error which cannot be caught.My suggestion: Make the singleton a real singleton (private ctor, no assignment etc.), inherit from T directly and let the instance variable be
singleton
instead ofsingleton_wrapper
(get rid of that, but check with http://tinyurl.com/ljdp8 that this does not cause a regression, if it does, put the members (destroyed
...) intosingleton_wrapper
like before), check the setting of thedestroyed
flag which must rely on the actual destruction of the "real"(!!!) singleton instance.The text was updated successfully, but these errors were encountered: