-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement load-balancing for MD #3662
base: python
Are you sure you want to change the base?
Conversation
This looks very good, I'll have a look at how to deal with the limitations and review the rest. But I only will have time to give it a proper look next week. So bear with me. |
@fweik Sure, take your time. |
Note to self: The wait_any fix needs some work. Newer boost::mpi versions handle nonblocking communication differently and, thus, for newer boost versions waitany.hpp does not compile. |
@hirschsn I'm still looking into this. But there are other changes for the cell systems which improve encapsulation, which will need to merged before this. Will keep you posted... |
@hirschsn I had a first look, and I think there is one point in the design that we should consider. I think it maybe it would be better to trigger the reparting via the resort. This has the advantage that this is called regularly during the simulation (e.g. when the particle moved a certain distance), then your DD could decide internally what to do, e.g. decide based on the metric every 100 invocations, or do nothing (only manual repart) and force a repart on a global resort (those typically occur only if there are new particles or other major changed). Resort can be directly triggered from the interface, |
The idea behind triggering it manually is that I (read: anyone :D) can test different strategies with this interface; and–in fact–implement them in python in the simulation script. This might not be, what mere users of load-balancing might want, I agree. At some point in the near future I also wanted to offer automatic capabilities, which is exactly what you are describing. Different automatic strategies could be implemented locally in generic_dd or elsewhere. The hook, however, into resort, is worth considering right now. Do you see any problems with also offering manual repart capabilities, in addition, to let's say something like this (conceptually): system.cell_system.set_generic_dd(..., auto_loadbalancing="npart");
Could you elaborate? I don't get, what you want to tell me. :) |
Codecov Report
@@ Coverage Diff @@
## python #3662 +/- ##
=======================================
- Coverage 88% 87% -1%
=======================================
Files 524 532 +8
Lines 23471 23782 +311
=======================================
+ Hits 20658 20742 +84
- Misses 2813 3040 +227
Continue to review full report at Codecov.
|
I just wanted to say that you'd still have the possibility to call it manually, but I guess you can also directly do that via the python binding of
No I think that's fine. system.cell_system.set_generic_dd(..., auto_loadbalancing="npart"); is about what I had in mind. As you are saying, this can probably also be addressed later. The test failures are due to the |
Test failures: Yes, I will take care of wait_any today. Also, my [[noreturn]] failed on older compilers because errexit was not [[noreturn]]. However, the osx tests do not seem to like making errexit [[noreturn]]. I am currently looking into the failing test cases and will ping you, once I'm done. |
Just a quick note: the Clang 6 jobs have been removed recently in favor of Clang 9. The osx-cuda job was removed. For AppleClang 9 on osx, I'm not sure why there's an error, it should support attributes. |
That is somewhere between Clang 6 and Clang 7 if I remember correctly. AppleClang's version numbers match the Xcode major version number, not the Clang major version number. However, even Clang 6 should have supported |
This reverts commit 6ad6d48.
This PR implements a generic version of the "domain decomposition" cell system/topology that allows for load-balanced grids and repartitioning.
The load-balancing is implemented in an external library (librepa). This PR makes this library an optional dependency to ESPResSo. Additionally, a module called "GenericDD", which is a shared library is compiled. Espresso-Core depends on it. The shared library implements the new cell system. If the dependency librepa is not present, these are simply compiled to stubs that give an error. Additionally, the python interface for cell_system is changed such that it offers a "set_generic_dd" analogously to the other cell systems. The interface functionality for the generic_dd is implemented in an extra Python file generic_dd. The testsuite is changed to also test generic_dd in several smaller tests (collision_detection, pairs, random_pairs) and an additional test that simply checks if the new cell system with its different grid types and repartitionings gets the same energy in a simple NVE setting as ESPResSo's default "domain decomposition" cell system.
Example:
With these chages, it is possible to do:
Limitations:
Description of changes:
Missing:
Suggestions and feedback welcome.