Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefix Julia error output with rank #360

Open
sloede opened this issue Mar 10, 2020 · 16 comments
Open

Prefix Julia error output with rank #360

sloede opened this issue Mar 10, 2020 · 16 comments

Comments

@sloede
Copy link
Member

sloede commented Mar 10, 2020

Currently, if you are running a Julia/MPI program in parallel and something bad happens, you get a lot of ERROR: LoadError: LoadError: UndefVarError: ... messages, which are all horrible interleaved. This in itself is a known "user issue" with MPI and (probably) cannot be fixed in an efficient manner. However, it would already help a lot that when running Julia/MPI programs, the error messages include the global rank such that a user has at least a fighting chance in finding out which rank died first. E.g, something like ERROR (rank 2): LoadError: LoadError: UndefVarError: ...

I don't know if this is even possible (injecting information in the Julia runtime output) without changes to upstream Julia, but it would IMHO be a great help to many scientists.

@simonbyrne
Copy link
Member

I don't think there is a way to do this across different MPI implementations.

Alternatively, you could write to a file with MPI I/O using a shared file pointer (MPI_FILE_WRITE_SHARED), though we don't expose this function yet (PRs welcome!)

@simonbyrne
Copy link
Member

For what it's worth, I was unable to get MPIEXEC_PREFIX_DEFAULT to do anything with MPICH, but the -prepend-rank (which documented in the help screen but not the man page) does work.

@simonbyrne
Copy link
Member

simonbyrne commented Jun 10, 2020

One option would be an interface such as Cprintf in Chapter 8 of the "Parallel Programming with MPI" book
https://github.com/cyliustack/benchmark/blob/b91924d5dc842906ebf94d4b154d548d944a030f/mpi/ppmpi/chap08/cio.c

We could define an interface like

MPI.Cprint(comm, root) do io
   print(io, ...)
end

which would be collective over comm, and copy all the data to root, and print it from there.

@simonbyrne
Copy link
Member

I also suggested what I think is a better solution to the MPI forum:
mpi-forum/mpi-issues#296

@sloede
Copy link
Member Author

sloede commented Jun 10, 2020

I also suggested what I think is a better solution to the MPI forum:
mpi-forum/mpi-issues#296

This sounds like a good suggestion. However, would we benefit from this for the error output of Julia itself? In that case, the Julia executable would have to be somehow "MPI-aware", wouldn't it?

@simonbyrne
Copy link
Member

I'm not quite sure yet how it would work. One option would be to modify Base.stdout, but I don't think that is a good idea as it won't help with the interleaving issue.

Interestingly, I did try out using the shared file pointers with /dev/stdout: it works with Open MPI, but not MPICH.

@simonbyrne
Copy link
Member

cf pmodels/mpich#4632

@sloede
Copy link
Member Author

sloede commented Sep 10, 2020

I am back at this issue again, since we started parallelizing Trixi.jl with MPI. It's really annoying that if there is a runtime issue that only occurs on a subset of all ranks (or even just one), there is no way to discern this from the error message - instead, you have to re-run again and this time add copious amounts of println that includes the MPI rank.

Do think it would be feasible to convince Julia main to add the option to specify a prefix that is added to all output lines? And would it even be possible to implement something like this in a sane way? I'm think about something like

julia --e 'using MPI; MPI.Init(); Base.error_prefix(string(MPI.Comm_rank(MPI.COMM_WORLD)) * ": ")' script.jl

that would turn

ERROR: MethodError: no method matching String(::Int64)
Closest candidates are:
  String(::String) at boot.jl:321
  String(::Array{UInt8,1}) at strings/string.jl:39
  String(::Base.CodeUnits{UInt8,String}) at strings/string.jl:77
  ...
Stacktrace:
 [1] top-level scope at REPL[3]:1

into

17: ERROR: MethodError: no method matching String(::Int64)
17: Closest candidates are:
17:   String(::String) at boot.jl:321
17:   String(::Array{UInt8,1}) at strings/string.jl:39
17:   String(::Base.CodeUnits{UInt8,String}) at strings/string.jl:77
17:   ...
17: Stacktrace:
17:  [1] top-level scope at REPL[3]:1

I don't know, as I'm writing this, I can already feel that this is not a very elegant solution, but neither can I come up with something better. It's just hat not being able to re-use the compile cache while developing a Julia package with MPI is already painful enough (compared to compiled languages), but adding the fact that there's no obvious way to connect "compiler errors" to the ranks on which they occur just makes this worse :-(

@vchuravy
Copy link
Member

If you are running under Slurm or another manager you can also direct the stderr to a file per rank, or create a wrapper script that add the slurm task ID as a prefix to the output.

The most reliable way is to use OpenMPI with --tag-output

Adding an option to Julia would be interesting, but very invasive... Especially if you are interested in errors and not just logged messages.

@sloede
Copy link
Member Author

sloede commented Sep 11, 2020

@vchuravy Thanks a lot for these suggestions! As far as I can tell from the manual, with Slurm I can use, e.g.,

#SBATCH --error=errors-%j-%t.out

which redirects all errors to a file identified by the job id and the task id (= rank).

or create a wrapper script that add the slurm task ID as a prefix to the output.

How would I be able to achieve this?

The most reliable way is to use OpenMPI with --tag-output

This is very interesting indeed. However, I have found this only for OpenMPI - do you know whether there is a similar option for MPICH (which seems to be the default for MPI.jl under Linux)?

EDIT: I just found it... for MPICH, the -l flag prefixes the rank to each output. Note that I literally mean to each output (and not each line of output, as it seems it prefixes the rank to each print statement (here's what print_timer() output looks like from the MPI root):
image

@vchuravy
Copy link
Member

Yeah Simon mentioned that he had trouble with MPICH #360 (comment)

How would I be able to achieve this?

I don't have a ready made solution, but as an example:

➜  ~ julia -e "error()" 2>&1 | ts "[1]"
[1] ERROR: 
[1] Stacktrace:
[1]  [1] error() at ./error.jl:42
[1]  [2] top-level scope at none:1

which uses ts from moreutils for other ideas look here https://unix.stackexchange.com/questions/440426/how-to-prefix-any-output-in-a-bash-script

and then you can use something like:

cat > launch.sh << EoF_s
#! /bin/sh
exec \$* | ts "[\$PMI_RANK] "
EoF_s
chmod +x launch.sh

srun --mpi=pmi2 ./launch.sh julia

Where PMI_RANK would be the environment variable for the global rank. (Caveat I haven't tested the above)

@antoine-levitt
Copy link
Contributor

The main pain point for me is interleaving within a line, and these workarounds don't fix that issue. Can something be done about that? Eg line buffering?

@simonbyrne
Copy link
Member

Not that I know of: unfortunately there are no APIs for controlling buffers (each MPI implementation handles the output combination differently).

@sloede
Copy link
Member Author

sloede commented Nov 1, 2020

Can something be done about that?

Short of writing an I/O handler that controls all output to the terminal, no, I don't think so. Non-interleaving line output to the terminal means that there would have to be a central instance that controls the output, which means global serialization of this problem. Since this is in contrast to the core goals of MPI, I don't think this feature will ever be provided by the MPI libraries themselves.

You can do something like this on your own for output to files, using MPI I/O (I've done this for logging purposes before), but it becomes very slow soon (IIRC, with >100 cores the overhead is already significant). Otherwise I think you'll have to implement it yourself, I'm afraid :-/

@antoine-levitt
Copy link
Contributor

That is pretty annoying. The simplest solution (and probably the only one that makes sense for larger process counts) is to do all the printing on process 0. The problem with that is that external libraries (eg Optim) don't know about MPI. A pretty brutal solution to that is to use redirect_stdout() on processes >0. It's a hack but it works.

@sloede
Copy link
Member Author

sloede commented Nov 1, 2020

Yes, all sufficiently large (ie, beyond toy size) MPI-parallel programs that I know of only print from the MPI root. That's no help though if you're debugging and/or experiencing run time errors,where you typically don't control I/O. The problem with external libraries is exactly the reason for me to create this issue (here Julia being the "external" library).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants