Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpointing doesn't support recursion with more than 1 MPI rank #4280

Closed
jngrad opened this issue Jun 11, 2021 · 4 comments · Fixed by #4724
Closed

Checkpointing doesn't support recursion with more than 1 MPI rank #4280

jngrad opened this issue Jun 11, 2021 · 4 comments · Fixed by #4724

Comments

@jngrad
Copy link
Member

jngrad commented Jun 11, 2021

In ESPResSo 4.2-dev, a regression from #4167 broke the checkpointing mechanism. When reloading from a checkpoint file, script interface objects with global creation policy are not broadcasted, i.e. they are only restored on rank 0. This affects ~63 python classes.

The checkpointing tests don't check that these objects are restored on all MPI ranks. This could be achieved by e.g. introducing a new particle with type 10 on a cell belonging to MPI rank 1 or rank 2 that only interacts with a shape-based constraint. Then we load the system from the checkpoint, recalculate the force on that particle and check that it is non-zero.

MWE:

# mwe_save.py
import os
import espressomd.checkpointing
import espressomd.shapes

checkpoint = espressomd.checkpointing.Checkpoint(checkpoint_id='mwe', checkpoint_path=".")
if checkpoint.has_checkpoints(): # cleanup old checkpoint files
    for filepath in os.listdir(checkpoint.checkpoint_dir):
        os.remove(os.path.join(checkpoint.checkpoint_dir, filepath))

system = espressomd.System(box_l=3 * [10.0], time_step = 0.01)
system.part.add(pos=[system.box_l / 2. - 1., system.box_l / 2. + 1.], type=[0, 0])
system.constraints.add(shape=espressomd.shapes.Sphere(center=system.box_l / 2., radius=1), particle_type=1)
system.non_bonded_inter[1, 0].lennard_jones.set_params(epsilon=1., sigma=1., cutoff=3.0, shift=0.1)
checkpoint.register("system")
checkpoint.save(0)

system.integrator.run(0)
print(system.part[:].f)
# mwe_load.py
import espressomd.checkpointing

checkpoint = espressomd.checkpointing.Checkpoint(checkpoint_id='mwe', checkpoint_path=".")
checkpoint.load(0)

system.integrator.run(0)
print(system.part[:].f)

Output:

$> mpiexec -n 2 ./pypresso mwe_save.py
[[-1475.26213767 -1475.26213767 -1475.26213767]
 [ 1475.26213767  1475.26213767  1475.26213767]]
$> mpiexec -n 2 ./pypresso mwe_load.py
[[-1475.26213767 -1475.26213767 -1475.26213767]
 [    0.             0.             0.        ]]
@jngrad
Copy link
Member Author

jngrad commented Jun 14, 2021

This doesn't affect release 4.1.4.

On 4.2-dev, this only affects objects contained in an object list (e.g. Constraints, LBBoundaries, etc.). The contained objects are deserialized on the head node, then added to the core via std::move() on the head node only:

for (auto const &packed_object : object_states) {
auto o = std::dynamic_pointer_cast<ManagedType>(
BaseType::deserialize(packed_object, *BaseType::context()));
add(std::move(o));
}

@jngrad jngrad self-assigned this Jun 14, 2021
@jngrad jngrad changed the title Checkpointing doesn't work with more than 1 MPI rank Checkpointing doesn't support recursion with more than 1 MPI rank Jun 21, 2021
@jngrad
Copy link
Member Author

jngrad commented Jun 21, 2021

The fundamental issue with the current object management framework is the lack of support for recursive unpickling. At the moment, contained objects are managed by the GlobalContext on which the ObjectList container is stored. The container and containees need to be stored in the same context std::unordered_map, so that the ObjectList can access its contained objects.

Calling ObjectList::set_internal_state() on all nodes

This doesn't work out, because the contained objects end up being deserialized twice on worker nodes:

  • first, they are deserialized with the wrong ObjectId but are properly added in the core (via the call to ObjectList::set_internal_state() on all nodes)
  • then, they are deserialized with the correct ObjectId but are not added in the core (via the call to GlobalContext::make_shared() which calls GlobalContext::remote_make_handle()).

This is a consequence of the separation of concerns: ObjectHandle(from which ObjectList and contained objects inherit) only takes care of sending shared pointers to core containers, while Context takes care of instantiation and broadcast on MPI nodes.

Creating a third policy GlobalContaineePolicy for contained objects

This doesn't work, because contained objects would live in a different std::unordered_map object than the one the global ObjectList object lives in (the container needs to have access to the containees).

Adding a new flag bool is_containee in the ObjectHandle class

This allows us to inject a hook in GlobalContext::make_shared() to prevent the call to GlobalContext::remote_make_handle() and therefore prevent the constructed containee from being overwritten later on worker nodes. But the ObjectId is still incorrect (UnpackVisitor::operator() ends up looking up an ObjectId that doesn't exist).

Rewrite the ContextManager framework to handle recursion

This would make the MPI callback logic a lot more complicated to keep track of the recursion level, and we won't be able to use the std::unordered_map to store (ObjectId, shared_ptr<ManagedType>) key/value pairs anymore, because now ObjectList would be managing MPI communication of its contained objects. To find a specific object we would have to traverse the GlobalContext map and if it contains ObjectList objects, traverse their maps as well. This would break the separation of concerns between the Context classes (that manage MPI communication and object storage) and ObjectHandle classes. We would also need a mechanism to prevent key collision in different maps.

@RudolfWeeber
Copy link
Contributor

Thank you for investigation this so thorroughly. In view of the ongoing discussion to move to shared memory parallelization, we might consider working around this by reverting (most of) the PR which introduced the regression.
Then, we would be back with the old 'solution' to handle the serialization of containers from Cython (via the ScripObjectRegsitry Cython class, IIRC). Then, everything except union shapes would work again. We would then have to write serilaization for the Union Shape in Cython analogously to scri tobject registry.

If we choose to go to shared memory parallism, the issue will go away, anyway.

kodiakhq bot added a commit that referenced this issue Jul 5, 2021
Temporary fix for #4280

Description of changes:
- prevent serialization of `ObjectList` objects when the MPI world size is greater than 1
- test that `ObjectList` objects  are correctly reloaded when the MPI world size is 1
- document LB checkpointing and cleanup script interface documentation
@jngrad
Copy link
Member Author

jngrad commented May 12, 2022

For the time being, we can restore checkpointing of most ObjectList classes by re-implementing the Python pickling functions (see a4e28bc in #4510). This doesn't work on all ObjectList, for example the Union shape is an ObjectList that can be stored as a shape-based constraint in a Constraints container, which is itself an ObjectList. There is no easy way to handle this kind of recursion in the Python pickling logic.

@kodiakhq kodiakhq bot closed this as completed in #4724 May 12, 2023
@kodiakhq kodiakhq bot closed this as completed in b3efb66 May 12, 2023
jngrad pushed a commit to jngrad/espresso that referenced this issue Jul 13, 2023
…4724)

Fixes espressomd#4280

Description of changes:
- checkpoint restrictions on the number of MPI ranks have been lifted
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants