-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
singleton: Allocate singleton instance on heap #79
Conversation
This probably tries to address the same issue as #69. Do you have a (small) test case that fails without this patch? |
Compare https://travis-ci.org/boostorg/serialization/builds/305474813 and https://travis-ci.org/boostorg/serialization/builds/301650246: |
Yes, that looks convincing. 😃 |
This looks promising. It would explain why the problem occurs on some environments an not in others. One thing that has slowed us down is that we've been using test_dll_exported to test this because that is where the problem showed up. Looking through the test suite, I see there is no actual test for the singleton. Clearly this would be useful. So if anyone has or can make one, I'd be interested in seeing it and likely incorporating such a test in the set of tests. This would be particularly valuable as apparently some are using the singleton module outside the serialization library. And the singleton has special considerations regarding visibility, inclusion and the like. |
I confirm that the patch in this pull request fixes the crash I describe in #69:
Without the patch:
With the patch:
Cheerz! |
@@ -121,7 +126,8 @@ class singleton : public singleton_module | |||
// our usage/implementation of "locking" and introduce uncertainty into | |||
// the sequence of object initializaition. | |||
use(& m_instance); | |||
return static_cast<T &>(t); | |||
if (!t) t = new singleton_wrapper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This construct does not look thread safe, whereas the original was (at least since C++11). Could this be an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just doing static singleton_wrapper *t = new singleton_wrapper;
above should be fine, don't you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm - as far as I'm concerned the original is/was correct. But it seems that with some combinations of compiler/library ...? the order of destruction of the static variable is not the reverse of the construction - as they should be. There is also the question of why we get any kind of memory issue when the program ends. A big motivation for using a static variable was that it would be built into the program image and never touch the heap. But apparently that was wishful thinking. It would qualify this as a work around for unexpected/obscure/incorrect runtime behavior regarding invocation of destructor of static variables. Of course I don't really know. It could be a bug / undefined behavior which no one has seen yet. Given all this uncertainty, I'm putting in the change given that it addresses the failure of one of the tests.
On the other hand, I've since changed the test to make it more correct. It's possible that it was an error in the test that no one sees and that now it's gone. If more information comes to light we can easily change it again. This is not a persistent object.
So, for now, we'll just keep on trucking ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masterleinad Yes I think that would fix the concurrency issue I had in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the heap allocation:
This could be avoided by declaring a byte array of singleton_wrapper size (and alignment) and using placement new.
But you'd need boost::call_once
to ensure thread safety.
While I noticed the problematic order of destruction I have not investigated the reasons in detail, so I unfortunately can't say what exactly goes wrong. But maybe I get around to getting a better explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that doing the assignment right in the declaration of the static
local variable solves the problem. If you don't want to do this, then you need to do double-checking to make things thread-safe, as shown here: https://en.wikipedia.org/wiki/Double-checked_locking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if any of you guys want to make a name for yourself, you could factor out the singleton into it's own boost library/component and the serialization library would just import it. When I made it it seemed simple - no heap allocation, initialization before main is called. Destruction after main is exited etc. A no brainer. But it might be that the compiler handling of static variables throws a monkey wrench into all this.
It seems you also need to rebase. |
Hmmm - I'm not convinced here. One of the main problems that this singleton attempts to address is the the notorious "static variable initialization sequence" issue. The original scheme guaranteed that that items would be be initialized only when first used. This change would break that I believe. The only reason we're using a raw pointer anyway is to work around an issue which presents itself only in some older compilers. Given the complexity of dealing with this, I'd really prefer that someone - maybe you - make a boost singleton library - maybe based on this so that the serialization library could just include it. |
That seems like an undue burden on @masterleinad that might lead to the problem never getting fixed :-( I would hope that it's possible to commit a fix without creating a new boost library. I understand being cautious around these issues. But it's not like everything is alright right now. The situation is already broken, and we know it used to work before. |
Hmmm -I believe it was always broken before - it just showed up on one new test test_dll_exported and only on some combinations of compiler/library/OS. As far as I know, there are no current bugs in the serialization library. A change like this has the potential to cause all sorts of havoc. Has anyone actually ran all the library tests with this change? |
AFAIK it was working before, specifically 1.63 is ok, 1.65.0 is ok, but not 1.65.1, see this issue dealii/dealii#5262 |
Even though the issue is affecting me, I would also urge caution here. Furthermore, it isn't clear to me that the problem is fully understood. It would really help confidence if someone who understands it could provide a more detailed analysis. @robertramey BTW I am also curious as to the reasons for the current complex structure with a function-local static and a static data member. Is this the right forum to ask about that? |
I know the bug was detected in b0a794d But that does not convince me that that it was introduced in that version. In other words, I think the bug is somewhere in the the standard library/compiler implementation and that the change in b0a794d only reveals what was only there. Our "fix" is not really a fix but rather a work around some phenomenon that we don't understand. |
juanchopanza - the "complex" structure is the triumph of implementation over design. That is, it wasn't actually "designed". I started with something simpler but had to make it a little more complex in order to make it work. A simpler design fails to address a number of issues: a) sequence of nested static objects. static object a refers to static object b etc. This means that b has to be created before a. But this is under control of the compiler/library before "main" is called - or not. b) Compilers will optimize code by elimination of code not explicitly referred to. The serialization library creates static objects to contain functions for de-serialization of exported derived pointers. These pointers are accessed through the vtable of the abstract base class so the compiler never sees any reference to them. So some method has to be sure that the compiler doesn't throw them away. In some cases this problem will manifest itself as a failure to link an in others as a runtime error. Note that behavior usually varies between debug and release modes, between compilers and compiler versions and whether the library is deployed as shared library/DLL or static library. Also different tests show different problems. It's not trivial to make this work across all these combinations. If fact the number of combinations is so large, that it's not really possible to test them all. And since we're relying on undefined behavior here, testing is actually the only way to make a guarantee that the library works. c) The above is a result of the design decision to permit deep serialization to objects through pointers. This is a key feature of the Boost Serialization library. There are libraries which don't do this - e.g. Cereal which as a consequence are much, much simpler. In many cases, they might be a better choice. But they are not comprehensive. The blessing/curse of the Boost serialization library is that one can use it serialize virtually any C++ data structure without having to tamper with the structure design itself. That is, one can more or less completely de-couple the serialization from the rest of one's program. This is very convenient for the library user, but comes at a heavy cost of complexity for the library developer/maintainer |
@robertramey: Pragmatically, do you have evidence that the current code is "more correct" than it was before? I do very much sympathize with your desire to come to a well-understood, well-designed code structure. But I'm not sure anyone has the gumption to actually make it happen, and it would be a shame if the perfect stood in the way of improvement. We know that the code used to work and broke with a particular commit (works just before that patch, fails just after it). Furthermore, this commit can be reverted without too much trouble IIRC. I'm not the expert in this area, but the commit message that went along with it ("Trying to get minGW to function for serialization library") at least doesn't explain what was wrong before and why the change made is correct. Do you recall why this particular change was made? |
The change was made to address failure of all tests with mingw. These tests are all passing now. I don't recall for a fact whether it was this change or some subsequent one which made the mingw tests pass again. |
So, moving forward, where do we find someone with a MingW system who could test the proposed patch to see what happens? |
LOL - actually, since I made this change I installed a mingw system just to be able to test this. I don't remember whether I tested this specific change on it. FYI I use a Mac with OSX. I bought Parallels, installed Windows, ubuntu linux and Mingw. It was basically the only way I could track down these kind of problems. Short version - works great for these issues. So I could investigate this some more. But it still takes a lot of time because it likely ends in a game of whack-a-mole. Fix it for mingw and it breaks mdvc, fix that and it breaks gcc, etc. |
meanwhile, @res2k could you please rebase to fix the conflict? |
62cb1dc
to
0be7dd5
Compare
@robertramey -- since you seem to be the only one with a system on which this can be tested, would you mind testing the patch there? I don't know what else to suggest: I'm not a BOOST maintainer, so the decision is not mine, but if this was in the projects I work on, I'd choose reverting a small part of a patch that we know broke something, over waiting for the ultimate patch that is, in all likelihood never going to arrive :-( Short of that, do you have any suggestions how we can proceed with this? |
Just for clarification: The first commit of this PR was merged meanwhile (7d216b4) leaving the fix of the possible thread-safety issue in this PR. |
Ah, thanks for the clarification and for rebasing what's left. The remainder appears to be a pretty straightforward change. |
I too ran into a problem with 1.65.1 where my historic raw archives using boost serialization failed to import. I'm using msvc VS2017 15.3 and 15.4.2. Is there any easy way I can grab a whole file replacement to patch this problem? I would drop back to boost 1.64 but I've already updated my msvc compiler to 15.3 and 1.64 does not recognize the compiler. And to make things worse we use offline installers and no one has VS2017 15.2 anymore! I also just tried boost 1.66 and my problem still remains. |
a2fd454
to
4f11789
Compare
I took another look at the issue. For one, I tried to understand the reason behind the destruction order issue. As a local static variable is initialized lazily, or possibly not at all, it's cleanup function will be registered dynamically as well (See eg https://godbolt.org/g/YkApUv - note the However, other ordering issues appeared with MSVC static builds. There the "solution" is to actually use So, on the one hand, that all appears to work (at least the automated tests look promising, see here: https://travis-ci.org/res2k/serialization/builds/338525872 and here: https://ci.appveyor.com/project/res2k/serialization/build/1.0.31-develop). On the other, distinguishing the cases makes the implementation a bit messy. I'll probably try if I can find a cleaner yet one-size-fits-all implementation. |
You are still making the mistake I described in #104: You have ctor/dtors for
|
The bug was introduced in that revision. See #104 for proof.
This is (currently and for a while) not true. The scheme used makes sure all singletons are created BEFORE |
Actually, is the |
The |
I did a quick test on the branch for this PR: I put |
Then where is it constructed? I'll have another look too and contribute a test for the singleton itself |
Ok found it. There is some trickery done there. Some classes are derived from I think this is intended and should be the way |
OK - I'm going to take another look at this. I wasn't happy with the original "fix", but things need to be fixed. So it looks like we'll have to fix this "fix". I hate this stuff. If some want's to submit a test which detects a memory leak, that might be useful to cut the "whack a mole" cycle and keep us moving forward. |
Hi, I am a user of serialization. Can anyone explain the Singleton issue in
simple terms. Why a "Singleton" is needed? Is there some sort of global
state of the library? Does it define an "environment"? Why would it leak
memory? Should the user do something specific about it?
…On Thu, Apr 26, 2018, 08:45 Robert Ramey ***@***.***> wrote:
OK - I'm going to take another look at this. I wasn't happy with the
original "fix", but things need to be fixed. So it looks like we'll have to
fix this "fix". I hate this stuff. If some want's to submit a test which
detects a memory leak, that might be useful to cut the "whack a mole" cycle
and keep us moving forward.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#79 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACMfYB-8S_OpctCnyzlSB_hkw1NWOG_Bks5tseu1gaJpZM4QmdrV>
.
|
Every type used in serialization registers itself with the serialization library. This is actually 2 singletons: One for the registered type and one for the register itself. During destruction of the registered type it unregisters itself. The problem was, that the register was destructed before the registered type and so unregistering lead to a crash due to the destruction check not working. This was partially fixed by using a heap allocated singleton instance which "fixed" it by leaking the register singleton (and possibly some more) And no the user can't do much about this. Use Boost >= 1.66 and don't use serialization in a shared library that is loaded/unloaded multiple times as this will build up wasted memory. |
On Thu, Apr 26, 2018 at 10:22 AM, Alexander Grund ***@***.***> wrote:
Every type used in serialization registers itself with the serialization
library. This is actually 2 singletons: One for the registered type and one
for the register itself. During destruction of the registered type it
unregisters itself.
Ok, probably is for the "tracking" feature. That sounds like an implicit
environment (global state).
(It reminds me of the registration of 'user derived data_types' in MPI;
there is never a good place to register/commit them and produces insane managing
issues. https://www.mpich.org/static/docs/v3.2/www3/MPI_Type_commit.html .
The only sane solution is that types can only be registered manually in the
main() program because all hopes of automatic registration blow up memory.)
Maybe this registration in Boost.Serialization can be made at the level of
the Archive type? So when the archive is destroyed the registered types go
away?
That will introduce extra state for the Archive type but it is the only
object with a definite lifetime.
The problem was, that the register was destructed before the registered
type and so unregistering lead to a crash due to the destruction check not
working. This was partially fixed by using a heap allocated singleton
instance which "fixed" it by leaking the register singleton (and possibly
some more)
Whatever that singleton was, perhaps it can be made part of the Archive
type. (yes, creating custom Archive types will be more difficult from the
user perspective). Having said that I must admit that when I create my own
custom Archives, I never implement the tracking features because it is too
complicated for my purposes. (I just don't serialize stuff through pointers
automatically because that means that the object shouldn't be serialized "easily" in
the first place, only "value types" should be serialized.)
And no the user can't do much about this. Use Boost >= 1.66 and don't use
serialization in a shared library that is loaded/unloaded multiple times as
this will build up wasted memory.
The loading multiple times affects me. Can the user avoid this problem by
disabling tracking somehow?
Can this be solved by making the library header-only?
#71
Thank you for the explanation.
|
Imo it cannot be avoided. The singleton gets instantiated for every type used (I think) No other solution possible than fixing it. I'm working on it too (look at my fork. I got test cases atm and will fix them in the next days) |
Ok I have my own branch ready. Waiting for the tests at https://ci.appveyor.com/project/Flamefire/serialization and https://travis-ci.org/Flamefire/serialization The branch is https://github.com/Flamefire/serialization/tree/feature/singleton-fix |
I believe this is now addressed. If anyone is unhappy with the resolution, open a new PR |
@robertramey: For completeness, could you say which pull request/commit fixed the issue? This way it's easier for others later on to follow the conclusion of the discussion. |
sorry I can't do this. the initial fix was just reversing the latest commit to fix another problem. Once the this was done, the original problem reared it's ugly head generating a months long saga which only recently has started getting sorted out. You can reconstruct all this yourself if it's important to you, but I personally can't spare the time to do this. |
This has been finally fixed upstream sometime in late 2018 [1], and this runtime check is very expensive. Let us skip it for a safe, good version. I have chosen 1.74 semi randomly because it has been released in 2020 and is the boost version found in Debian 11. [1] boostorg/serialization#79 (comment)
This has been finally fixed upstream sometime in late 2018 [1], and this runtime check is very expensive. Let us skip it for a safe, good version. I have chosen 1.74 semi randomly because it has been released in 2020 and is the boost version found in Debian 11. [1] boostorg/serialization#79 (comment)
This avoids issues at process exit when the inner static singleton_wrapper
happens to be cleaned up before the singleton<> instance.
Makes Travis checks pass on Linux.