Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drude gpu sample fails CI #3104

Closed
jngrad opened this issue Aug 25, 2019 · 4 comments
Closed

drude gpu sample fails CI #3104

jngrad opened this issue Aug 25, 2019 · 4 comments
Labels

Comments

@jngrad
Copy link
Member

jngrad commented Aug 25, 2019

Two samples have been randomly failing tests recently:

  • Grand Canonical has large deviations from target concentration (for the last 2 weeks)
  • Drude in BMIM PF6 has broken bonds (for the last 2 days)
@jngrad jngrad added the DevOps label Aug 25, 2019
bors bot added a commit that referenced this issue Aug 30, 2019
3107: Fix failing grand_canonical sample test and add documentation to samples r=jngrad a=jonaslandsgesell

This PR adds documentation in to the sample files:

* widom_insertion.py
* wang_landau_reaction_ensemble.py
* grand_canonical.py
* reaction_ensemble.py

The PR also fixes partly #3104: The problem with the failing test was that the excess chemical potential did not match the submitted concentration.
I now provide a matching pair of concentration and excess chemical potential.

Co-authored-by: Jonas Landsgesell <[email protected]>
Co-authored-by: Jean-Noël Grad <[email protected]>
@jngrad
Copy link
Member Author

jngrad commented Sep 5, 2019

@jngrad jngrad changed the title drude and grand canonical samples fail CI drude sample fails CI Oct 23, 2019
@KaiSzuttor KaiSzuttor changed the title drude sample fails CI drude gpu sample fails CI Dec 9, 2019
@jngrad
Copy link
Member Author

jngrad commented Dec 10, 2019

Finally found the source of the error: bad P3M parameters from the tuning function. There seems to be a pattern in P3M parameters from simulations that crash: the r_cut value is almost twice as small as in simulations that run fine. To reproduce it locally or on a coyote with ubuntu-python3:cuda-10.1:

make local_samples
cd testsuite/scripts/samples
sed -i  "/system.actors.add(p3m)/i p3m._params = {'cao': 7, 'inter': 32768, 'r_cut': 2.5491526892051883, 'alpha': 1.286486160729783, 'accuracy': 0.0009884728050820963, 'mesh': [120, 120, 120], 'epsilon': 0.0, 'mesh_off': [0.5, 0.5, 0.5], 'tune': True, 'check_neutrality': True, 'prefactor': 1389.3612645, 'alpha_L': 47.67025070635568, 'r_cut_iL': 0.06879447050636864, 'cao_cut': [0.0, 0.0, 0.0], 'a': [0.0, 0.0, 0.0], 'ai': [0.0, 0.0, 0.0], 'inter2': 0, 'cao3': 0, 'additional_mesh': [0.0, 0.0, 0.0]}" local_samples/drude_bmimpf6.py
rm -f local_samples/drude_bmimpf6_gpu_processed.py; ../../../pypresso test_drude_bmimpf6_with_gpu.py

The LJ sigmas are in the range 3.4-5.0, in simulations that don't crash the tuned P3M r_cut is in the range of 3.3-5.1, while in crashed simulations r_cut is in the range 2.5-2.8. If this is the real cause, we could add a lower bound to r_cut in the parameters of the tuning function.

Note: checkpointing the particle positions/forces/velocities obtained from a bad tuning (i.e., the simulation eventually crashed) into a simulation with good P3M parameters doesn't lead to a crash.

@fweik
Copy link
Contributor

fweik commented Apr 10, 2020

Do we have any theory why a low P3M r_cut leads to crashes? That does not make sense to me.

@KaiSzuttor
Copy link
Member

closing in favor of #3842

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants