-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better separation of concerns in the communication framework #3907
Comments
That's an anachronism from the autotools build system (which would call mpic++ and nvcc -- nowadays we use the MPI library search path directly with c++ or nvcc) and from when the OpenMPI headers contained syntax that could not be parsed by nvcc (that was fixed around CUDA 5.5). |
I think many of the affected callbacks should be replaced by script interface code, rather than be refactored as callbacks.
The interaction stuff is a prime candidate, IMO.
The LB stuff in gather_stats will go away with Walberla.
I think the callback registration should happen, where they are implemented as done in lb_walberla_interface.cpp.
|
Addendum: there are a lot of places in the code where free functions that should have internal linkage end up having external linkage, just so that they could be used in espresso/src/core/particle_data.hpp Lines 370 to 375 in 572f854
But most of them don't. Furthermore, many of them have side effects in the form of events, so accidentally calling one of them can have a noticeable impact on performance. This design is not sustainable. It's also not difficult to fix it, because it's only a matter of moving the callbacks from communication.cpp to the .cpp files where they are used: jngrad/espresso:fix-3906...fix-3907. The free functions can then be made static, or merged into the callbacks directly, so that they are not accessible in other translation units.
|
Also, there doesn't seem to be a good reason for making the following header files indirectly included in 90 .cpp files: espresso/src/core/communication.hpp Lines 55 to 57 in 572f854
|
Fixes #3907 Move callbacks from `communication.cpp` to the relevant source files. This hides implementation details of many submodules, which no longer need to expose local functions in the public interface with doxygen blocks stating for example "Used by \ref mpi_place_particle, should not be used elsewhere". This also reduces the visibility of `MpiCallbacks.hpp`, `communication.hpp`, `Particle.hpp` and `cuda_init.hpp`. When the head function and worker function have the same code, remove code duplication using `mpi_call_all()`. Remove the `CALLBACK_LIST` macro and split `mpi_gather_stats()` into independent callbacks.
Background
ESPResSo uses a custom MPI communication framework where everything is managed from the head node. Functions visible in the script interface must be implemented twice: once as a worker function, and once as a head function that contains the same body as the worker function, plus a preamble that triggers the worker function on worker nodes with
mpi_call_*()
. The body usually containsboost::mpi
andMPI_*
functions to synchronize data. Often, the body can be implemented only once in a separate free function. These head and worker functions (aka "callbacks") are managed by ourMpiCallbacks
framework which executes them via a visitor pattern.Problem statement
The source file
communication.cpp
is the central place to register callback functions, which should be declared and defined in the relevant cpp resp. hpp files. However for historical reasons, many callbacks have been written directly incommunication.cpp
, which means this file has to know about the implementation details of a variety of classes/structs. This prevents us from hiding implementation details of these classes/structs (e.g. making members private, replacing manual memory management by standard containers, etc.) and can lead to code duplication when the body of the worker function is copy-pasted into the body of the head function. Duplicated code will diverge over time.Prior work
Between 4.1 and 4.3 (waLBerla branch), many callbacks were moved to dedicated source files. For example the NpT callback was moved to
npt.cpp
(d21c7a4), becausecommunication.cpp
doesn't need to know the internal structure of the NpT parameters class, it only needs to know that a callback function exists, which is visible via a function prototype innpt.hpp
. The steepest descent parameters object went one step further and has static storage, making it inaccessible outside the source file:espresso/src/core/integrators/steepest_descent.cpp
Lines 40 to 41 in f760050
Course of action
Here are a few candidates that could be simplified further or moved to dedicated files:
mpi_bcast_ia_params()
which is written twice, and has now diverged: the worker implementation allocates a pointer withnew
and overwrites the old pointer without adelete
, leading to a memory leak. Addingdelete
before thenew
doesn't work, because the MPI communication step before that overwrote the address of the allocated memory (see Memory leaks in tabulated bonds #3906 for details).mpi_bcast_coulomb_params_slave()
needs to know about the Coulomb and dipolar parameter structs. It should rely onboost::serialization
instead, and have the callbacks written in the relevant electrostatics .cpp files.mpi_gather_stats()
mixes pressure/energy observables with LB communication. The LB communication logic was removed in 4.3. This can be simplified further in 4.3 by replacing theenum class GatherStats
and the switch statement by a 0/1 integer and a conditional, and removing code duplication between the head and worker functions.Challenges
Moving the remaining callbacks from
communication.cpp
to dedicated .cpp/.hpp files means we have to rely on thecomm_cart
global variable. Passing this global by reference means the dedicated .hpp files that declare callbacks must include MPI headers. However, some parts of the CUDA code are not allowed to include MPI header files, therefore features that are used by CUDA code and rely on MPI communication must either use thecomm_cart
global variable viaextern
, or rely on a second header file, e.g.npt_mpi.hpp
, to write the prototype of the callback.Outlook
This refactoring will facilitate hiding implementation details of parameter structs and help finding new ways to simplify the callback infrastructure. For example, #3891 removed one level of MPI indirection by having
mpi_steepest_descent()
callintegrate()
viampi_steepest_descent_slave()
instead of calling a chain of callbacks. In fact,mpi_steepest_descent()
could probably callmpi_integrate_slave()
directly instead ofmpi_steepest_descent_slave()
, removing yet another MPI indirection. This refactoring will also help on the long term if we choose to move in the direction of MPI+OpenMP or pure OpenMP.The text was updated successfully, but these errors were encountered: