Skip to content
Alberto F. Martin edited this page Jun 15, 2020 · 7 revisions

Remarks/tips/lessons learned while using/developing MPI-parallel programs in Julia

  1. Up to my knowledge (@amartinhuertas), at present, the unique way of "debugging" MPI.jl parallel programs is "print statement debugging". We have observed that messages printed to stdout using println by the different Julia REPLs running at different MPI tasks are not atomic, but broken/intermixed stochastically. However, if you do print("something\n") you are more likely to get it to print to a single line than println("something") (Thanks to @symonbyrne for this trick, it is so useful). More serious/definitive solutions are being discussed in this issue of MPI.jl.

  2. Some people have used tmpi (https://github.com/Azrael3000/tmpi) for running multiple sessions interactively, and we could try using the @mpi_do macro in MPIClusterManagers (I have not explored neither of them). If am not wrong, I guess that the first alternative may involve multiple gdb debuggers running at different terminal windows, and a deep knownledge of the low-level C code generated by Julia (see https://docs.julialang.org/en/v1/devdocs/debuggingtips/ for more details). I wonder whether, e.g., https://github.com/JuliaDebug/Debugger.jl, could be combined with tmpi.

  3. For reducing JIT lag it becomes absolutely mandatory to build a custom system image of (some of) the GridapDistributed.jl dependencies, e.g., Gridap.jl. See the following link for more details. https://github.com/gridap/Gridap.jl/tree/julia_script_creation_system_custom_images/compile. TO BE UPDATED WHEN BRANCH julia_script_creation_system_custom_images is merged into master. Assuming that the name of the Gridap.jl image is called Gridapv0.10.4.so, then one may call the parallel MPI.jl program as:

    mpirun -np 4 julia -J ./Gridapv0.10.4.so --project=. test/MPIPETScDistributedPoissonTests.jl