Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow MPI_Finalize possibly due to GNI datagram resource bottleneck #932

Open
tenbrugg opened this issue Aug 17, 2016 · 3 comments
Open
Milestone

Comments

@tenbrugg
Copy link

At times I have observed very slow, or possibly hung, job termination on avalon with 8000 or more ranks. This is with MPICH since OMPI frequently seg faults at this scale (#883) and I typically don't use it. To give you a sense of timing, sometime last week I noticed that SNAP computation took ~5min at this scale, and successful termination in that case took ~20 min.

Yesterday, Aug 16, SNAP failed during set up due to user error, and then did not terminate within the 30min job time limit. A file will be attached which shows relevant script output, analysis, and partial traces. When I switched to OpenMPI, the app behaved as expected, failing and terminating quickly. This could be an MPICH issue or a lower-level issue which is triggered by MPICH's use of libfabric-GNI. Howard's analysis in the next comment point towards the latter.

@tenbrugg
Copy link
Author

Summary of Howard's initial analysis:

I looked at the MPICH ofi netmod code and do see what can be causing this issue - namely that
all VC's (netmod VC's not GNI provider VC's), are marked active. That has the effect of requiring
an all-to-all short message pattern within the MPI_Finalize code. I could see how this could be
particularly problematic if the app hadn't actually exchanged any other messges with a lot of the
other ranks in the job before MPI_FInalize is called. That would result in the GNI provider having
to do a lot of connection setups just to exchange this one "mpich shutdown" message.

I think this problem should be reproducible using a simple mpi hello world program.

Debugging and fixing this will require a large system and ability to look at and interpret things like dmesg output from kgni.

What may be happening is that the GNI datagram mechanism is being overloaded (not enough
TX entries posted to kgni's session ring), resulting in lots of retries of datagram sends within kgni.

@tenbrugg
Copy link
Author

A couple more data points:

  • Even for apps which run successfully, at this scale MPICH termination takes much longer than for OpenMPI. For instance, XSBench with OpenMPI runs in ~2min, whereas XSBench with MPICH takes ~21 min. SNAP + OpenMPI finishes in 10min, SNAP + MPICH was killed after 30min, after having spent a large portion of that time in termination after having printed that SNAP was "Done".
  • Yesterday I ran out of dedicated time to run with warning enabled. However, recently when I started to look at SNAP's slow termination I ran with warning enabled and the console was flooded with messages like these. To get the app to finish then (Aug 10 or 11), I had to turn warning off again.
1536: libfabric:gni:ep_ctrl:_gnix_cm_nic_send():364<warn> _gnix_dgram_alloc returned Resource temporarily unavailable
1536: libfabric:gni:ep_ctrl:_gnix_cm_nic_send():364<warn> _gnix_dgram_alloc returned R
1280: ily unavailable
1280: libfabric:gni:ep_ctrl:_gnix_cm_nic_send():364<warn> _gnix_dgram_alloc returned Resource temporarily unavailable
1280: libfabric:gni:ep_ctrl:_gnix_cm_nic_send():364<warn> _gnix_dgram_alloc returned Resource temporarily unavailable
1280: libfabric:gni:ep_ctrl:_gnix_cm_nic_send():364<warn> _gnix_dgram_alloc returned Resource temporarily unavailable
1280: libfabric:gni:ep_ctrl:_gnix_cm_nic_send():364<warn> _gnix_dgram_alloc returned Resource temporarily unavailable

@tenbrugg
Copy link
Author

@tenbrugg tenbrugg changed the title Very slow MPI_Finalize due to GNI datagram resource bottleneck Very slow MPI_Finalize possibly due to GNI datagram resource bottleneck Aug 18, 2016
@hppritcha hppritcha added this to the future milestone Sep 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants