Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Electroneutrality breaks when combining Widom + cpH methods #4595

Closed
pm-blanco opened this issue Oct 16, 2022 · 6 comments · Fixed by #4609
Closed

Electroneutrality breaks when combining Widom + cpH methods #4595

pm-blanco opened this issue Oct 16, 2022 · 6 comments · Fixed by #4609
Labels

Comments

@pm-blanco
Copy link
Contributor

pm-blanco commented Oct 16, 2022

@davidbbeyer and I have observed that when combining the Widom insertion method with the constant pH method (and possibly also with the Reaction ensemble) leads to the destruction of an additional ion, ultimately breaking the electroneutrality of the system.

Below I provide a minimal example script that produces the bug. This only happens in the 4.2.0 version of espresso but it does not happen in the previous version 4.1.4, so it could be related to one of the newer changes we introduced.

import espressomd
from espressomd import version
import numpy as np

pH=7
pKa=7
exclusion_range=1
seed=12345
kT=1
BOX_LENGTH=10
N_SALT=2
N_acid = 2
bug=False

types = {"H": 0, "Na": 1, "Cl": 2, "Ha": 3, "a": 4}
charges = {"H": 1.0, "Na": 1.0, "Cl": -1.0, "Ha": 0.0, "a": -1.0}

system = espressomd.System(box_l=[BOX_LENGTH,]*3)
system.part.add(type=[types["Na"]]*N_SALT, pos=np.random.rand(N_SALT, 3) * BOX_LENGTH,
                q=[charges["Na"]]*N_SALT)
system.part.add(type=[types["Cl"]]*N_SALT, pos=np.random.rand(N_SALT, 3) * BOX_LENGTH,
                q=[charges["Cl"]]*N_SALT)
if bug:

    system.part.add(type=[types["Ha"]]*N_acid, pos=np.random.rand(N_acid, 3) * BOX_LENGTH,
                q=[charges["Ha"]]*N_acid)
else:
    system.part.add(type=[types["a"]]*N_acid, pos=np.random.rand(N_acid, 3) * BOX_LENGTH,
                q=[charges["a"]]*N_acid)
    system.part.add(type=[types["H"]]*N_acid, pos=np.random.rand(N_acid, 3) * BOX_LENGTH,
                q=[charges["H"]]*N_acid)

if version.friendly() == "4.2" or version.friendly() == "4.3":
    from espressomd import reaction_methods
    RE = espressomd.reaction_methods.ConstantpHEnsemble(
        constant_pH=pH,
        kT=kT,
        exclusion_range=exclusion_range,
        seed=seed)
    widom = espressomd.reaction_methods.WidomInsertion(
        kT=kT,
        seed=seed)

elif version.friendly() == "4.1":

    from espressomd import reaction_ensemble
    RE = espressomd.reaction_ensemble.ConstantpHEnsemble(
        temperature=kT,
        exclusion_radius=exclusion_range,
        seed=seed)
    RE.constant_pH=pH
    widom = espressomd.reaction_ensemble.WidomInsertion(
        temperature=kT,
        seed=seed)

RE.add_reaction(
        gamma=10**(-pKa),
        reactant_types=[types["Ha"]],
        reactant_coefficients = [1],
        product_types=[types["a"], types["H"]],
        product_coefficients = [1, 1],
        default_charges={types["Ha"]: charges["Ha"],
                         types["a"]: charges["a"],
                         types["H"]: charges["H"]}
        )

widom.add_reaction(
        reactant_types=[],
        reactant_coefficients=[],
        product_types=[types["Na"], types["Cl"]],
        product_coefficients=[1, 1],
        default_charges={types["Na"]: charges["Na"],
                         types["Cl"]: charges["Cl"]}
        )


system.setup_type_map(type_list=list(types.values()))

if version.friendly() == "4.2" or version.friendly() == "4.3":
    widom.calculate_particle_insertion_potential_energy(reaction_id=0)

elif version.friendly() == "4.1":
    widom.measure_excess_chemical_potential(0)

for step in range(2000):
    print("step ", step)
    RE.reaction(reaction_steps=100)
    charge = 0.0
    for p in system.part:
        charge += p.q
    a = system.number_of_particles(type = types["a"])
    H = system.number_of_particles(type = types["H"])
    Ha = system.number_of_particles(type = types["Ha"])
    Na = system.number_of_particles(type = types["Na"])
    Cl = system.number_of_particles(type = types["Cl"])
    print("Before widom: a:", a, "H:", H, "Ha:", Ha, "Na:", Na, "Cl:", Cl)
    print("total charge ", charge)


    # Measure the chemical potential
    for j in range(50):
        charge = 0.0

        if version.friendly() == "4.2" or version.friendly() == "4.3":
            widom.calculate_particle_insertion_potential_energy(reaction_id=0)

        elif version.friendly() == "4.1":
            widom.measure_excess_chemical_potential(0)
        charge = 0.0
        for p in system.part:
            charge += p.q
        if charge != 0:
            a = system.number_of_particles(type = types["a"])
            H = system.number_of_particles(type = types["H"])
            Na = system.number_of_particles(type = types["Na"])
            Ha = system.number_of_particles(type = types["Ha"])
            Cl = system.number_of_particles(type = types["Cl"])
            print("After widom: a:", a, "H:", H, "Ha:", Ha, "Na:", Na, "Cl:", Cl)
            print("total charge ", charge)
            exit()
@jngrad jngrad added the Bug label Oct 17, 2022
@jngrad jngrad changed the title Bug: Electroneutrality breaking when combining Widom + cpH methods Electroneutrality breaks when combining Widom + cpH methods Oct 17, 2022
@jngrad
Copy link
Member

jngrad commented Oct 17, 2022

I can reproduce it on 4.2.0 with 2 MPI ranks, but not with 1 MPI rank. The bug also doesn't always happen. Could not reproduce it on 4.3-dev.

@jngrad
Copy link
Member

jngrad commented Oct 17, 2022

I cannot bisect the bug due to how randomly it happens. UBSAN doesn't catch any undefined behavior on 1 MPI rank. On 2 MPI ranks it catches a null pointer dereference in the boost::mpi code when runtime error messages are reduced on the head node after setting the box size (no idea why).

@pm-blanco
Copy link
Contributor Author

pm-blanco commented Oct 18, 2022

I can reproduce it on 4.2.0 with 2 MPI ranks, but not with 1 MPI rank. The bug also doesn't always happen. Could not reproduce it on 4.3-dev.

@jngrad I did some tests on my work station too. I can reproduce it with 4.2.0 with 1, 2 and 4 MPI ranks (although not always happens, sometimes it actually runs). It never seem to happen in the devel version with any number of ranks. I am very confused about this bug, I will try to dig further into it and see if I can pin point its origin

@pm-blanco
Copy link
Contributor Author

pm-blanco commented Oct 24, 2022

@davidbbeyer @jngrad I have make some progress in improving the reproducibility of this bug. I have updated the example script, it now produces the bug with both espresso 4.2.0 and espresso devel. The bug can be switched on and off with bug=True or bug=False. Some observations about the bug:

  1. The bug always happens if one starts with a system with only "HA" particles if one performs first widom insertion and then cpH and widom again.
  2. The bug never happens if one starts with a system with "A" and "H" particles and does the same procedure than in 1).
  3. The bug never happens in 4.1.4

@pm-blanco
Copy link
Contributor Author

pm-blanco commented Oct 24, 2022

Ok, I finally have pinpointed the origin of the bug. The problem is on the bookkeeping of "empty ids" done with the attribute ReactionMethods::ReactionAlgorithm::m_empty_p_ids_smaller_than_max_seen_particle. Since this list is not shared between different instances of ReactionMethods, it can contain ids that have been actually created by other reactions (or even manually by the user...). This problem actually also is present in espressov4.1.4 but the bug does not show up for reasons that I do not understand yet.
@jngrad @davidbbeyer I think that this bookkeeping as actually implemented is quite dangerous since it assumes that particles are not being created/destroyed outside the given instance of the reaction_methods. In my opinion, it would be more robust to check for the holes in the particle id list every time that the user calls the reaction function. I guess that such implementation would be slightly slower, but I do not think that it would significantly damage the performance. What do you think? If you agree with this solution, I can propose a bugfix myself.

@jngrad
Copy link
Member

jngrad commented Oct 24, 2022

@pm-blanco Good find! I completely forgot about that bookkeeping feature.

Ideally this container of "free" ids should have been a shared pointer global variable in particle_node.cpp, and all reaction methods should have stored a copy of that pointer, such that all reactions would agree on which particle ids are available. But our plan is to remove all globals. Also, such a global variable would only fix the bug from concurrent reactions, and not the other bug you just described where the user creates a new particle using one of the freed ids.

Just for future reference, and for readers of the bugfix release changelog, here is an illustration of the second variant of the bug. Consider a system of 10 particles with id 0 to 9. A reaction deletes particle with id 5 and marks it as "available". Then the user creates a particle with id 5. This particle id will be later used to create a new particle by the reaction algorithm. But in ESPResSo, the function that creates a particle doesn't throw an error if the particle already exists, and instead silently moves it (terrible design decision!). If the user-defined particle has an electric charge that differs from the one that will be created, the system will throw a runtime error about the system not being charge-neutral. However if the reaction is uncharged, there is now way to notify the user that something went wrong.

Here is a MWE derived from a sample:

diff --git a/samples/reaction_ensemble_complex_reaction.py b/samples/reaction_ensemble_complex_reaction.py
index cd727a587..308dc0fca 100644
--- a/samples/reaction_ensemble_complex_reaction.py
+++ b/samples/reaction_ensemble_complex_reaction.py
@@ -100,7 +100,13 @@ numbers = {type_A: [], type_B: [], type_C: [], type_D: [], type_E: []}
 RE.set_non_interacting_type(type=max(types) + 1)
 
 # warmup
-RE.reaction(reaction_steps=200)
+RE.reaction(reaction_steps=2)
+p = system.part.add(pos=np.random.random(3) * system.box_l, type=55, id=52)
+print(p.type, p.pos)
+RE.reaction(reaction_steps=2)
+p = system.part.by_id(52)
+print(p.type, p.pos)
+exit(0)
 
 for i in range(200):
     RE.reaction(reaction_steps=10)

Here is the output, with printfs to show when an id is added or taken from m_empty_p_ids_smaller_than_max_seen_particle. We see that the particle with type 55 is accidentally reacted into a particle of type 2.

$ ./pypresso ../samples/reaction_ensemble_complex_reaction.py
Mark pid 242 as available
Mark pid 52 as available
55 [19.67757918 18.69700346 31.817899  ]
Take pid 52
Take pid 242
Mark pid 52 as available
Mark pid 242 as available
Take pid 52
Take pid 242
Mark pid 193 as available
Mark pid 47 as available
2 [32.15424818 23.28252943 33.68358226]

Computing the free id when needed as proposed by @pm-blanco is probably the best solution, even though it comes with a performance penalty. @RudolfWeeber do you have any objection?

@jngrad jngrad added this to the Espresso 4.3.0 milestone Oct 24, 2022
kodiakhq bot added a commit that referenced this issue Nov 11, 2022
Description of changes:
- particle state tracking is now fully encapsulated
   - helps with separation of concerns and reducing code duplication
- particle creation and particle moves are now carried out by 2 independent functions
    - this makes the reaction methods bugs throw an error instead of silently reacting particles of the wrong type (#4595)
@kodiakhq kodiakhq bot closed this as completed in #4609 Dec 12, 2022
kodiakhq bot added a commit that referenced this issue Dec 12, 2022
Fixes #4595

Description of changes:
- Reaction methods now rebuild the list of free particle ids every time `ReactionAlgorithm::do_reaction()` and `WidomInsertion::calculate_particle_insertion_potential_energy()` are called
- Added a new test that checks that the reaction methods do the bookkeeping of empty ids by setting up two different instances of the reaction methods (constant pH and Widom insertion) with competing reactions. 

I benchmarked my implementation against the current python branch using the `mc_acid_base_reservoir.py` script and I get the following timings: 1.564e-04 (my PR) vs  1.473e-04 (current python branch). That means that my implementation comes with a 6% performance loss in its current state.
jngrad pushed a commit to jngrad/espresso that referenced this issue Dec 23, 2022
Fixes espressomd#4595

Description of changes:
- Reaction methods now rebuild the list of free particle ids every time `ReactionAlgorithm::do_reaction()` and `WidomInsertion::calculate_particle_insertion_potential_energy()` are called
- Added a new test that checks that the reaction methods do the bookkeeping of empty ids by setting up two different instances of the reaction methods (constant pH and Widom insertion) with competing reactions.

I benchmarked my implementation against the current python branch using the `mc_acid_base_reservoir.py` script and I get the following timings: 1.564e-04 (my PR) vs  1.473e-04 (current python branch). That means that my implementation comes with a 6% performance loss in its current state.
jngrad pushed a commit to jngrad/espresso that referenced this issue Dec 23, 2022
Fixes espressomd#4595

Description of changes:
- Reaction methods now rebuild the list of free particle ids every time `ReactionAlgorithm::do_reaction()` and `WidomInsertion::calculate_particle_insertion_potential_energy()` are called
- Added a new test that checks that the reaction methods do the bookkeeping of empty ids by setting up two different instances of the reaction methods (constant pH and Widom insertion) with competing reactions.

I benchmarked my implementation against the current python branch using the `mc_acid_base_reservoir.py` script and I get the following timings: 1.564e-04 (my PR) vs  1.473e-04 (current python branch). That means that my implementation comes with a 6% performance loss in its current state.
jngrad pushed a commit to jngrad/espresso that referenced this issue Dec 23, 2022
Fixes espressomd#4595

Description of changes:
- Reaction methods now rebuild the list of free particle ids every time `ReactionAlgorithm::do_reaction()` and `WidomInsertion::calculate_particle_insertion_potential_energy()` are called
- Added a new test that checks that the reaction methods do the bookkeeping of empty ids by setting up two different instances of the reaction methods (constant pH and Widom insertion) with competing reactions.

I benchmarked my implementation against the current python branch using the `mc_acid_base_reservoir.py` script and I get the following timings: 1.564e-04 (my PR) vs  1.473e-04 (current python branch). That means that my implementation comes with a 6% performance loss in its current state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants