Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tabulated bond memory leaks #3961

Merged
merged 7 commits into from
Oct 24, 2020
Merged

Fix tabulated bond memory leaks #3961

merged 7 commits into from
Oct 24, 2020

Conversation

jngrad
Copy link
Member

@jngrad jngrad commented Oct 20, 2020

Fixes #3906

Description of changes:

  • fix dangling pointers in tabulated bonds
  • reduce code duplication in callbacks

@jngrad jngrad added this to the Espresso 4.2 milestone Oct 20, 2020
@jngrad jngrad requested a review from KaiSzuttor October 20, 2020 17:33
boost::mpi::broadcast(comm_cart, *tab_pot, 0);

bonded_ia_params[i].p.tab.pot = tab_pot;
bonded_ia_params[i].p.tab.pot = new TabulatedPotential();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't looked into the code, but can't we use smart pointers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so: we overwrite the memory block of the bond object on worker nodes with a bytestring of the same object from the head node, so the counter from a shared_ptr would be overwritten. As for unique_ptr, in the line you highlighted, we do an assignment, so the smart pointer goes out of scope and gets garbage-collected... but it contains a random memory address (because of the bytestring deserialization a few lines above), which is a recipe for segmentation faults.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure i'm following... but it sounds like this is a place for larger scale refactoring which is out of scope of this pull request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the correct way would be to write boost serialization methods, which can handle pointer members out of the box, e.g.

struct Tabulated_bond_parameters {
  int type;
  TabulatedPotential *pot;

private:
  friend boost::serialization::access;
  template <typename Archive>
  void serialize(Archive &ar, long int /* version */) {
    ar &type;
    ar &pot;
  }
};

but there is no guarantee that we are overwriting a bond type with the same bond type, so we would have to add destructors too... Not sure if this is worth the effort, if our long-term goal is to replace this interaction communication infrastructure with a python setter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a much more promising refactoring goal is breaking the dependency on MpiCallbacks.hpp from consumers of MPI callbacks (fix-3906...fix-3907-p1), which as a side effect is removing global variables and removing prototypes of MPI worker functions from header files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a much more promising refactoring goal is breaking the dependency on MpiCallbacks.hpp from consumers of MPI callbacks (fix-3906...fix-3907-p1), which as a side effect is removing global variables and removing prototypes of MPI worker functions from header files.

Isn't that totally orthogonal to the memory leak issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the correct way would be to write boost serialization methods, which can handle pointer members out of the box, e.g.

struct Tabulated_bond_parameters {
  int type;
  TabulatedPotential *pot;

private:
  friend boost::serialization::access;
  template <typename Archive>
  void serialize(Archive &ar, long int /* version */) {
    ar &type;
    ar &pot;
  }
};

but there is no guarantee that we are overwriting a bond type with the same bond type, so we would have to add destructors too... Not sure if this is worth the effort, if our long-term goal is to replace this interaction communication infrastructure with a python setter.

sorry, if that has been discussed and i don't remember it anymore... Do we have some design plans for the interaction stuff already?

@RudolfWeeber
Copy link
Contributor

RudolfWeeber commented Oct 21, 2020 via email

@KaiSzuttor
Copy link
Member

so think we should now merge this but also stop refactoring the existing infrastructure if the plan is to move most of the parts to the sciptinterface.

if (bonded_ia_params[bond_type].type == BONDED_IA_TABULATED_DISTANCE or
bonded_ia_params[bond_type].type == BONDED_IA_TABULATED_ANGLE or
bonded_ia_params[bond_type].type == BONDED_IA_TABULATED_DIHEDRAL) {
delete bonded_ia_params[bond_type].p.tab.pot;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant't you just check for nullptr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (bonded_ia_params[bond_type].p.tab.pot) delete bonded_ia_params[bond_type].p.tab.pot;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this would raise a Clang-Tidy warning because you can simplify it to

delete bonded_ia_params[bond_type].p.tab.pot;

which is safe on a nullptr.
However, this would also trigger a segfault, because you can create a tabulated bond on top of an existing FENE bond, in which case .p.tab.pot contains a random address. You can overwrite an bond existing bond in ESPResSo, this is used for example in testsuite/python/interactions_bonded_interface.py.

@fweik
Copy link
Contributor

fweik commented Oct 22, 2020

IMHO, #3019 should be fixed before this. This correct way to solve this is to have the tabulated bond manage it's own memory (via unique ptr or similar), but this is not possible as long as the bond parameters is a union, because you can only have trivial types in a union. This is why I opened 3019 in the first place. There is no way to fix the memory leak which is not a hack in the sense that it mixes unrelated concerns.

@RudolfWeeber
Copy link
Contributor

Let's merge this, also due to the comment fixes contained in the PR.
The "fix" for the mem leak is, of course, a hack.
It will go away, once we convert the bond parameters to a Variant and let the script interface do the parameter broadcasting.

@RudolfWeeber RudolfWeeber added the automerge Merge with kodiak label Oct 24, 2020
@kodiakhq kodiakhq bot merged commit daae3aa into espressomd:python Oct 24, 2020
@jngrad jngrad deleted the fix-3906 branch January 18, 2022 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leaks in tabulated bonds
4 participants