-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and test singleton implementation #105
Conversation
a70657d
to
d63b853
Compare
@robertramey Note how this also fixes the current test failure in |
I'm absolutely in favor for this PR. It essentially removes the changes from #79. Since the destruction sequence is well defined in the standard [basic.start.term] (and nobody else before had this problem), it seems to be a non-standard-conformance of gcc. |
@tobias-loew The description in #79 is simply wrong. The On standard destruction order: Normally it is all good. Just in case of multiple shared libraries the problem occurs. See https://stackoverflow.com/questions/50064617/order-of-destruction-for-static-function-members-in-shared-libraries There I show how the same types are destructed together although the defined destruction order should not allow this. Although this should be blamed on GCC (or whatever) one cannot ignore it. Hence the need for |
@Flamefire I agree that #79 is screwed up. But I'd really like to see a gcc-workaround-macro here and a comment explaining what's going on. "If the completion of the constructor or dynamic initialization of an object with static storage duration And the stackoverflow-example shows that gcc seems to have trouble obeying that. |
The I agree that it is a bug in GCC, but we cannot ignore it. I won't call the From my tests GCC 6.4 and 7.1 are affected. Triggering the bug with a simple test case seems to be quite hard. It requires static boost, but dynamic libraries. But besides that I'm not quite sure. |
I found that Clang is also affected. It seems to be a general problem or dependent on e.g. GLIBC (using 2.12 and 2.23 here) For the @robertramey Could you tell me what you need to get some progress here? Or is there another maintainer who is able to verify this PR? |
5c72a62
to
35329e5
Compare
Just to make sure I understand this correctly: The leaky version only leaks when a shared library is unloaded - correct? Not saying this is ideal, but is this really something you have to worry about? |
This is correct. The heap-allocated instance is never freed and hence leaks on terminate/unload. It is a problem once you have a long running app that loads/unloads the library multiple times. There are people doing that and for serialization related stuff it makes sense. Some user complaint here: #104 (comment) |
This PR also fixes the segmentation fault with 1.68.0 (which was not the case with 1.67.0) we have in a simple test in dealii/dealii#7074 . @masterleinad can probably point to the configure test which I can't find at the moment. |
It's here. |
Thanks for the additional input and the confirmation, that this PR works for you too. |
Yes, 1.67 worked. I wonder what changed since then such that it is crashing again. |
The commit from 1.66 was reverted, changing the memory leak back to a crash: 8ca532a |
it looks like this PR works for everyone, perhaps @robertramey can review and merge it? |
Sorry to crash the party, but I'm really surprised how this PR can fix the crash with gcc-builds. |
it's also |
Wrong: See #110 which shows how
You forgot: On first access (e.g. https://github.com/boostorg/serialization/pull/105/files#diff-c23e78320ecd29888bb77891a5943f41L54)
I'm more concerned about a crash than a memory leak. |
35329e5
to
aac07cf
Compare
I checked #110 with Boost 1.63 and 1.67 with VS 2015 and it runs fine with BOTH versions.
this seems to be a case of "explicitly accessed"
This may be true for the single case where it's either memory-leak or crash. But here we have crashes of some against memory-leaks of the whole Boost-community, and that's a different story. You shouldn't make your code run on the expense of others. |
1.63 is expected to work, 1.65+ is not. If 1.67 worked for you, something is wrong on your side: See appveyor. Did you actually run all the tests? Reconfigured? Started from a clean checkout?
Yes, missed this in your text. So what was your complaint about this?
Dito. You have a memory leak on shutdown or lib unload, we have crashes. So at whose expense should it be?
Nope, its the runtime for ALL (AFAIK) linux runtimes. And the crash is a use-after-free/destroy which can easily be caught by fixing |
You may check the C++ standard: https://stackoverflow.com/a/19293663/1930508. However I added new commits to #111 to make this explicit and not confuse people not knowing the speciality of This test does not simply fail. As explained in that PR it crashes which is what I'm talking about: Thanks for your explanation on
This is wrong. Your patch doesn't include the fix for I will rebase this PR on current dev but will overwrite your changes. What you did is dangerous again: Your This will be my last effort on this unless
What is happening here is not how Open Source projects should work. Thanks @davydden for summarizing it. Exactly my point! |
5d3d714
to
fa0c705
Compare
fa0c705
to
236dab2
Compare
I'd also like to be frank:
The only functional difference I can see, is that the PR-code forces is_destroyed to true when the singleton_wrapper gets destructed. If this is the crucial change then why isn't there any comment in the code? Please, additionally, try to explain the code with code-comments. An a final please: let's not draw red lines - we're all better than that. |
@tobias-loew Thank you. Your points are valuable as they ask real questions about the code at hand.
For this there is #127 which tests release versions on CI. Is this what you meant?
There are none due to the following reason: I acknowledge that there ARE situations where a singleton is destroyed although it shouldn't be. The reasons for this are unclear (although they happen only on linux, and they may be solved, or not... they are hard at least). So I added tests for
Did you mean pre 1.65? If yes, I mostly agree. There the
Is there really a need to explain by comments what
I Did. See the most important parts here:
What needs more explanation? Could you comment this on patch lines?
This is why everything (here) is accompanied by tests showing RED before and GREEN after this PR. Finally:
If the maintainer pushes own changes without even giving anyone the chance to review/comment it, the value of this is reduced. Yes it ultimately comes down to the maintainer what gets in, but if it is only the maintainer and his opinion that matters, then it is no longer "peer-reviewed". |
"The dream" for a library author is for enthusiastic community contributors to step forward and first assist, then take over maintenance for the library in a fashion that is as consistent, good, or better than the level of quality sustained by the original author. This can never happen without cultivating their interest through engagement, mentoring, and being generous with credit. |
I don't doubt your good intentions here. But I think your expectations are unrealistic. you seem to picture the "open source development" process as some sort of cooperative venture among equals. It is not and can never be. The main issue is who takes responsibility for delivering a quality product. Responsibility cannot be assigned or delegated to an amorphous group. When something goes wrong - someone has to be tasked with getting it fixed. If this "someone" is a group of people, there is no guarantee that this "assignment/delegation" can be accomplished. This not a comment on open source development, GitHub or boost, It's a long held and agreed upon feature of effective human collaboration. This is why we have presidents, ceos, team captains and ... software product managers. In the context of Github/Boost, this function is fulfilled by the library author or maintainer. His job is to act as a "gate keeper" to accept only the best improvements. Along with this responsibility he as the authority to decide what to accept, delegate tasks to other people, reverse those decisions when necessary. When something goes wrong he, and only he, takes the blame. Responsibility cannot be delegated. This is unlike authority (to fix or test or whatever) which can be delegated. If I accept a less than optimal fix, it's my problem. I couldn't just point the finger at contributor X and assign blame to him. It also means that inevitable trade offs are made by a single person, in this case me. Of course every time such a tradeoff is made, someone will disagree and that's unfortunate. But that is the nature of the word "tradeoff". But by keeping this authority in the hands of one person, it's much more likely that the work (in this case the serialization library) will maintain a logical coherence and singularity of purpose which has been essential for keeping it relevant and useful for these many years. Note that in the course of discussion of this issue and in the past, I've suggested that anyone with interest could peel off the singleton and make it a new boost library and submit it to boost. This makes a lot of sense as the issues related to a singleton are a lot more subtle than meets the eye and the chances of a casual user getting it right are very small. This is especially true in presence of issues related to order of static initialization, DLLS, visibility No one has shown any interest in undertaking such a task and accepting responsibility for it. about 15 years ago such a library was submitted to boost by a very highly regarded author of other boost libraries. The submission was reject for some reason that I can't remember. Faced with this, I made my own - and here we are. The PR system is very helpful. But the way it's setup it just permits me to merge the PR into the branch before I can test it. And many times the PR has some useful parts, but has some other parts besides. I choose to handle this in the way I have. This is based solely on expediency - nothing more. Note that I make these changes in the develop branch so that I can watch the results on the Boost Test matrix - which is much more comprehensive than the CI. Also not that all changes to the develop branch are public and considered provisional. So the door is still open to those who might feel that I got something wrong. I try to give credit to everyone who makes a useful contribution. I don't think I've been remiss in that in this instance. Some PRs are misguided, incomplete, go beyond appropriate scope, not well tested, cannot be verified, break something else, narrow the applicability of the library, or suffer from some other problem so they can't be merged without change. Only the most trivial can be merged without change. In this particular case, my motivation for the changes I did rather than just accepting the PR should be pretty obvious by just comparing the actual changes the proposed ones. I think your comment greatly underestimates the time, effort and commitment I or any author of a boost library invests to keep improving this library. You don't see the work involved in reviewing PRs which suffer from the problems above. The approach you suggest would greatly increase that requirements of that commitment while diminishing the quality of the library (such as it is). |
Perhaps my wording was not precise, but i never implied that. I also never implied that the responsibility is shared among an amorphous group. All what you say is perfectly fine and I don't disagree with that. My main concern was the workflow of PRs, i.e. discussion on specific issues/requests @tobias-loew and @Flamefire have above. This does not imply that you and @Flamefire are equal in decision taking/responsibility/etc.
I apologies if this sounded so, that was never my intention. I was commenting on the workflow only. But let's not dive into this topic. I will refrain from making further comments on this PR, as those are not related to the suggested changes. I hope that you, @Flamefire and @tobias-loew will continue discussing technical aspects of this PR. p.s. disclaimer: I never met @Flamefire on-line or off-line, so my comments here don't have any personal bias towards either sides (author or maintainer). The only reason I watch this PR is because this issue affects other projects. |
Build succeeds on 11, but fails on 10 and 12 due to the boost bug.[1][2] --References-- [1] dealii/dealii#7344 [2] boostorg/serialization#105 Reported by: portscout git-svn-id: svn+ssh://svn.freebsd.org/ports/head@481957 35697150-7ecd-e111-bb59-0022644237b5
291702a
to
9247e30
Compare
The test uses singletons in shared libraries that are linked against static boost. On termination the (internal) singletons will be destructed in an unexpected order causing a use-after-free and segfault/assert.
9247e30
to
af19d19
Compare
Part: made changes in singleton.hpp to address running issue regarding test_dll_exported This reverts commit e1893dd.
af19d19
to
5339279
Compare
|
||
#include <iostream> | ||
#include <boost/serialization/singleton.hpp> | ||
// <[email protected]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this line here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is from #110 where I reverted changes made to a test I also changed. As explained in #110 the changed test currently in develop does rather test the compiler than the library which is why I reverted/overwrote it.
If this is objectionable I'll add my own version of test_singleton.cpp
as test_singleton2.cpp
or below into this file but that would mean redundant code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I don't understand is what's Gennadiy Rozental's e-mail doing here, on a line of its own, it makes no sense to me.
Re test_singleton2.cpp, yes, I was going to suggest the same thing. I'd also drop the Travis changes, or just keep the essential changes to a minimum.
I also wonder whether the test will be cleaner if it's split into two, one testing a plain singleton, the other - inherited. There will be duplication, sure, but the array of states and the base class will vanish, so it could be more readable that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned I reverted the original commit: 8ca532a#diff-b2f7572da5c9081e523924931012ab08
Re test_singleton2.cpp, yes, I was going to suggest the same thing. I'd also drop the Travis changes, or just keep the essential changes to a minimum.
Some are required (-fPIC
), I can drop the rest, as they are in #105
I also wonder whether the test will be cleaner if it's split into two, one testing a plain singleton, the other - inherited. There will be duplication, sure, but the array of states and the base class will vanish, so it could be more readable that way.
This would mean 2 separate compilation units. I wanted to have the 2 use cases (plain, inherited) next to each other, to show the similarity and difference. But I can split this in 2 files, you think, this is more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I didn't like about this patch is that it's not clear which changes are necessary to make the test pass, and what aren't. The singleton_module changes, for instance, and a few other things.
But I now see that you're simply undoing e1893dd#diff-3f1e861be15fd4926f5e5b00388f2309.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes got messy due to the failed attempt to fix this which is why I added an explicit revert commit. My Patch is a single commit here: e0691b0 (see the name)
The changes there are much less and mostly comments. Also compare the pre-1.65 version which that patch is based on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As requested I split the is_destroyed
test into 2: #129
// the sequence of object initializaition. | ||
use(m_instance); | ||
// the sequence of object initialization. | ||
use(& m_instance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use(m_instance)
- as before this patch - is undefined behavior, but I think that this one is undefined too. Non-undefined would have been to use a pointer as m_instance. Of course getting this past Robert may prove problematic. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why it is UB?
I do think this line is not required because of https://github.com/boostorg/serialization/pull/105/files/53392794c49ba129c4f3eb70debb7426746b325b#diff-3f1e861be15fd4926f5e5b00388f2309R198 but I'm to afraid to remove it ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In either case, I don't see how this is going to work - use
will just get inlined here and disappear. But who knows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the point the initializer for m_instance is called, m_instance is not yet initialized, and using a reference that does not refer to a valid object is undefined behavior.
A pointer would not have this issue because globals are zero-initialized first, before the dynamic initializers are run, so the value of m_instance when accessed in its own initializer would be nullptr.
Even better would probably be to use a pointer for m_instance and use(&m_instance)
, which avoids all these subtleties, as the address of m_instance is always valid no matter whether it's initialized or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait. My patch uses &m_instance
, hence it takes the address of the object. This is never dereferenced. With your statement
as the address of m_instance is always valid no matter whether it's initialized or not.
This is ok, isn't it?
So at no point a reference is used, we only get the address of a static member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_instance
is an uninitialized reference. &m_instance
is undefined in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't understand WHY it is required. Just judging from the comment this is to "construct the instance at pre-execution time". But the m_instance
with its initialization is there for exactly the same reason (I added this as a comment with a link to https://groups.google.com/forum/#!topic/microsoft.public.vc.language/kDVNLnIsfZk)
But @robertramey wrote in an email:
LOL - it's only in there because the tests fail without it.
I verified this locally (Linux, GCC), but don't understand it...
I'm currently testing static T * m_instance;
and it seems to work. But this would be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess would be that when there are no references to m_instance, and it's in a static library, it's not being linked at all.
Closed in favor of the simpler #131 |
As described in #104 the singleton implementation was broken in b0a794d which led to crashes especially in the context of (multiple) shared libraries.
The "fix" allocating the instance on the heap basically just never freed the instances and hence leaked memory. This was due to not understanding the root cause of the problem.
To summarize:
There are 2 use cases for the singleton: On where the type
T
is derived fromsingleton<T>
and one wheresingleton<T>
is used without derivingT
from it. The above commit put theis_destroyed
handling into the ctor/dtor ofsingleton<T>
. However ifT
is not derived from it, thensingleton<T>
is never instantiated and henceis_destroyed
is never set. See also some discussion in #104 and #79.The crash/memory corruption now happens (or happened before the leaky implementation) if the
tkmap
used fromextended_type_info_typeid
is destroyed first. As it used from the ctor ofextended_type_info_typeid
this should normally not happen. However it does happen when used from multiple shared libraries under (at least) GCC 5.4 and 7.1. The reason is currently unknown (see https://stackoverflow.com/questions/50064617/order-of-destruction-for-static-function-members-in-shared-libraries) but we can just assume, the order of destruction is undefined. Hence the need for theis_destroyed
flag.On this PR:
is_destroyed
flag is correctly setdetail:singleton_wrapper
with the correctis_destroyed
setting in the dtor. It uses a method-local variable to enforce initialization on first use (preferred way to avoid static init disaster https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2)!is_destroyed
on enter ofget_instance
. Note that the comment was wrong. It did not refer to the instance and is even removed in non-debug buildssingleton
protected and default to avoid accidental misuse (it might never be called ifT
is not derived fromsingleton<T>
)Other notes:
singleton_wrapper
is the most derived type of the instance it will always be destructed beforeT
andsingleton<T>
(ifT
is derived from it) There is nothing one can do about it!singleton<T>
but using it directly is potentially dangerous as you can have multipleT
instances and you have only 1singleton<T>
where you might not expect it. Example:extended_type_info
uses asingleton<detail::tkmap>
to work around this issue becauseextended_type_info_typeid
already uses asingleton<tkmap>
. It might be wise to enforce deriving fromsingleton<T>
and usingT::get_instance()
or at least do that in the library.Finally: Please try to break this! I'm pretty sure I covered everything by using TDD but I might have missed something.