-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpointing doesn't support recursion with more than 1 MPI rank #4280
Comments
This doesn't affect release 4.1.4. On 4.2-dev, this only affects objects contained in an object list (e.g. espresso/src/script_interface/ObjectList.hpp Lines 146 to 150 in 7eaf45f
|
The fundamental issue with the current object management framework is the lack of support for recursive unpickling. At the moment, contained objects are managed by the Calling
|
Thank you for investigation this so thorroughly. In view of the ongoing discussion to move to shared memory parallelization, we might consider working around this by reverting (most of) the PR which introduced the regression. If we choose to go to shared memory parallism, the issue will go away, anyway. |
Temporary fix for #4280 Description of changes: - prevent serialization of `ObjectList` objects when the MPI world size is greater than 1 - test that `ObjectList` objects are correctly reloaded when the MPI world size is 1 - document LB checkpointing and cleanup script interface documentation
For the time being, we can restore checkpointing of most |
…4724) Fixes espressomd#4280 Description of changes: - checkpoint restrictions on the number of MPI ranks have been lifted
In ESPResSo 4.2-dev, a regression from #4167 broke the checkpointing mechanism. When reloading from a checkpoint file, script interface objects with global creation policy are not broadcasted, i.e. they are only restored on rank 0. This affects ~63 python classes.
The checkpointing tests don't check that these objects are restored on all MPI ranks. This could be achieved by e.g. introducing a new particle with type 10 on a cell belonging to MPI rank 1 or rank 2 that only interacts with a shape-based constraint. Then we load the system from the checkpoint, recalculate the force on that particle and check that it is non-zero.
MWE:
Output:
The text was updated successfully, but these errors were encountered: