-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tabulated bond memory leaks #3961
Conversation
If the current element of the array is a tabulated bond, free the memory allocated for the force and energy vectors.
src/core/communication.cpp
Outdated
boost::mpi::broadcast(comm_cart, *tab_pot, 0); | ||
|
||
bonded_ia_params[i].p.tab.pot = tab_pot; | ||
bonded_ia_params[i].p.tab.pot = new TabulatedPotential(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't looked into the code, but can't we use smart pointers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so: we overwrite the memory block of the bond object on worker nodes with a bytestring of the same object from the head node, so the counter from a shared_ptr
would be overwritten. As for unique_ptr
, in the line you highlighted, we do an assignment, so the smart pointer goes out of scope and gets garbage-collected... but it contains a random memory address (because of the bytestring deserialization a few lines above), which is a recipe for segmentation faults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure i'm following... but it sounds like this is a place for larger scale refactoring which is out of scope of this pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the correct way would be to write boost serialization methods, which can handle pointer members out of the box, e.g.
struct Tabulated_bond_parameters {
int type;
TabulatedPotential *pot;
private:
friend boost::serialization::access;
template <typename Archive>
void serialize(Archive &ar, long int /* version */) {
ar &type;
ar &pot;
}
};
but there is no guarantee that we are overwriting a bond type with the same bond type, so we would have to add destructors too... Not sure if this is worth the effort, if our long-term goal is to replace this interaction communication infrastructure with a python setter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a much more promising refactoring goal is breaking the dependency on MpiCallbacks.hpp
from consumers of MPI callbacks (fix-3906...fix-3907-p1), which as a side effect is removing global variables and removing prototypes of MPI worker functions from header files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a much more promising refactoring goal is breaking the dependency on
MpiCallbacks.hpp
from consumers of MPI callbacks (fix-3906...fix-3907-p1), which as a side effect is removing global variables and removing prototypes of MPI worker functions from header files.
Isn't that totally orthogonal to the memory leak issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the correct way would be to write boost serialization methods, which can handle pointer members out of the box, e.g.
struct Tabulated_bond_parameters { int type; TabulatedPotential *pot; private: friend boost::serialization::access; template <typename Archive> void serialize(Archive &ar, long int /* version */) { ar &type; ar &pot; } };but there is no guarantee that we are overwriting a bond type with the same bond type, so we would have to add destructors too... Not sure if this is worth the effort, if our long-term goal is to replace this interaction communication infrastructure with a python setter.
sorry, if that has been discussed and i don't remember it anymore... Do we have some design plans for the interaction stuff already?
sorry, if that has been discussed and i don't remember it anymore... Do we have some design plans for the interaction stuff already?
My understanding is that
* Actors, bonded and non-bonded interactions should be used via the script-interface, which handles the broadcasting of parameters, so the comm infrastructure for it in the core goes away
* The BondedInteractionParams type should be a variant rather than a union
* Since the communication is handled by the script interface, bonds are set in the core on the node level by assignment. As long as the strucs holding the parameters for the individual bodns either don’t contain manual memory management or provide a destructor, it should not be possible to loose memory anymore.
|
so think we should now merge this but also stop refactoring the existing infrastructure if the plan is to move most of the parts to the sciptinterface. |
if (bonded_ia_params[bond_type].type == BONDED_IA_TABULATED_DISTANCE or | ||
bonded_ia_params[bond_type].type == BONDED_IA_TABULATED_ANGLE or | ||
bonded_ia_params[bond_type].type == BONDED_IA_TABULATED_DIHEDRAL) { | ||
delete bonded_ia_params[bond_type].p.tab.pot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cant't you just check for nullptr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (bonded_ia_params[bond_type].p.tab.pot) delete bonded_ia_params[bond_type].p.tab.pot;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this would raise a Clang-Tidy warning because you can simplify it to
delete bonded_ia_params[bond_type].p.tab.pot;
which is safe on a nullptr
.
However, this would also trigger a segfault, because you can create a tabulated bond on top of an existing FENE bond, in which case .p.tab.pot
contains a random address. You can overwrite an bond existing bond in ESPResSo, this is used for example in testsuite/python/interactions_bonded_interface.py
.
IMHO, #3019 should be fixed before this. This correct way to solve this is to have the tabulated bond manage it's own memory (via unique ptr or similar), but this is not possible as long as the bond parameters is a |
Let's merge this, also due to the comment fixes contained in the PR. |
Fixes #3906
Description of changes: