Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with UCX and OpenMPI 3.1.x #2921

Closed
yosefe opened this issue Oct 7, 2018 · 4 comments · Fixed by open-mpi/ompi#5878
Closed

Memory leak with UCX and OpenMPI 3.1.x #2921

yosefe opened this issue Oct 7, 2018 · 4 comments · Fixed by open-mpi/ompi#5878
Assignees
Labels

Comments

@yosefe
Copy link
Contributor

yosefe commented Oct 7, 2018

We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2, for that matter) built with UCX support. The leak shows up
whether the “ucx” PML is specified for the run or not. The applications in question are arepo and gizmo but it I have no reason to believe
that others are not affected as well.

Basically the MPI processes grow without bound until SLURM kills the job or the host memory is exhausted.
If I configure and build with “--without-ucx” the problem goes away.

Details:
—————————————
RHEL7.5
OpenMPI 3.1.2 (and any other version I’ve tried).
ucx 1.2.2-1.el7 (RH native)
RH native IB stack
Mellanox FDR/EDR IB fabric
Intel Parallel Studio 2018.1.163

Configuration Options:
—————————————————

CFG_OPTS=""
CFG_OPTS="$CFG_OPTS C=icc CXX=icpc FC=ifort FFLAGS=\"-O2 -g -warn -m64\" LDFLAGS=\"\" "
CFG_OPTS="$CFG_OPTS --enable-static"
CFG_OPTS="$CFG_OPTS --enable-orterun-prefix-by-default"
CFG_OPTS="$CFG_OPTS --with-slurm=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-pmix=/opt/pmix/2.1.1"
CFG_OPTS="$CFG_OPTS --with-pmi=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-libevent=external"
CFG_OPTS="$CFG_OPTS --with-hwloc=external"
CFG_OPTS="$CFG_OPTS --with-verbs=/usr"
CFG_OPTS="$CFG_OPTS --with-libfabric=/usr"
CFG_OPTS="$CFG_OPTS --with-ucx=/usr"
CFG_OPTS="$CFG_OPTS --with-verbs-libdir=/usr/lib64"
CFG_OPTS="$CFG_OPTS --with-mxm=no"
CFG_OPTS="$CFG_OPTS --with-cuda=${HPC_CUDA_DIR}"
CFG_OPTS="$CFG_OPTS --enable-openib-udcm"
CFG_OPTS="$CFG_OPTS --enable-openib-rdmacm"
CFG_OPTS="$CFG_OPTS --disable-pmix-dstore"

rpmbuild --ba \
         --define '_name openmpi' \
         --define "_version $OMPI_VER" \
         --define "_release ${RELEASE}" \
         --define "_prefix $PREFIX" \
         --define '_mandir %{_prefix}/share/man' \
         --define '_defaultdocdir %{_prefix}' \
         --define 'mflags -j 8' \
         --define 'use_default_rpm_opt_flags 1' \
         --define 'use_check_files 0' \
         --define 'install_shell_scripts 1' \
         --define 'shell_scripts_basename mpivars' \
         --define "configure_options $CFG_OPTS " \
         openmpi-${OMPI_VER}.spec 2>&1 | tee rpmbuild.log
@yosefe yosefe added the Bug label Oct 7, 2018
@yosefe yosefe self-assigned this Oct 7, 2018
@ca-taylor
Copy link

Running strace on one of the MPI processes shows a steady stream of brk() calls that is not present when running without UCX (PML=ob1).

[pid 11573] 0.011080 brk(NULL) = 0x67dc000
[pid 11573] 0.000047 brk(0x67dd000) = 0x67dd000
[pid 11573] 0.000034 brk(NULL) = 0x67dd000
[pid 11573] 0.000031 brk(NULL) = 0x67dd000
[pid 11573] 0.000032 brk(NULL) = 0x67dd000
[pid 11573] 0.000027 brk(0x67de000) = 0x67de000
[pid 11573] 0.000029 brk(NULL) = 0x67de000
[pid 11573] 0.000032 brk(NULL) = 0x67de000
[pid 11573] 0.000033 brk(NULL) = 0x67de000
[pid 11573] 0.000027 brk(0x67e4000) = 0x67e4000
[pid 11573] 0.000028 brk(NULL) = 0x67e4000
[pid 11573] 0.000028 brk(NULL) = 0x67e4000
[pid 11573] 0.000046 brk(NULL) = 0x67e4000
[pid 11573] 0.000027 brk(0x67e8000) = 0x67e8000
[pid 11573] 0.000028 brk(NULL) = 0x67e8000
[pid 11573] 0.000029 brk(NULL) = 0x67e8000
[pid 11573] 0.016593 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b7b4c705000
[pid 11573] 0.081271 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b7b4c705000
[pid 11573] 0.007754 brk(NULL) = 0x67e8000
[pid 11573] 0.000056 brk(0x67e9000) = 0x67e9000
[pid 11573] 0.000057 brk(NULL) = 0x67e9000
[pid 11573] 0.000050 brk(NULL) = 0x67e9000
[pid 11573] 0.076326 brk(NULL) = 0x67e9000
[pid 11573] 0.000035 brk(0x67ea000) = 0x67ea000
[pid 11573] 0.000034 brk(NULL) = 0x67ea000
[pid 11573] 0.000027 brk(NULL) = 0x67ea000
[pid 11573] 0.000315 brk(NULL) = 0x67ea000
[pid 11573] 0.000028 brk(0x67eb000) = 0x67eb000
[pid 11573] 0.000020 brk(NULL) = 0x67eb000
[pid 11573] 0.000019 brk(NULL) = 0x67eb000
[pid 11573] 0.000310 brk(NULL) = 0x67eb000

I'll see if I can tell where they are being called...

@ca-taylor
Copy link

Attaching to the same MPI process as in the previous comment with gdb and setting breakpoints on mmap, brk, and sbrk, shows...

(gdb) break mmap
Breakpoint 1 at 0x2b7a5c9befc0 (2 locations)
(gdb) break brk
Breakpoint 2 at 0x2b7a5c9bb440
(gdb) break sbrk
Breakpoint 3 at 0x2b7a5c9bb4b0 (2 locations)
(gdb) continue
Continuing.

Breakpoint 3, 0x00002b7a62311db0 in ucm_override_sbrk () from /lib64/libucm.so.0
(gdb) backtrace
#0 0x00002b7a62311db0 in ucm_override_sbrk () from /lib64/libucm.so.0
#1 0x00002b7a62316022 in ucm_dlmalloc () from /lib64/libucm.so.0
#2 0x00002b7a623113f7 in ucm_malloc_impl.isra.8 () from /lib64/libucm.so.0
#3 0x00002b7a5fce491a in opal_datatype_optimize_short (pData=, count=, pTypeDesc=)
at opal_datatype_optimize.c:64
#4 opal_datatype_commit (pData=0x1000) at opal_datatype_optimize.c:295
#5 0x00002b7a5c0921ef in ompi_datatype_commit (type=) at ../../../ompi/datatype/ompi_datatype.h:158
#6 ompi_coll_base_allgatherv_intra_neighborexchange (sbuf=0x1000, scount=1649523360, sdtype=0x9660f50, rbuf=0xe, rcounts=0x3640, rdispls=0x0,
rdtype=0x76f500 <ompi_mpi_byte>, comm=0x76f700 <ompi_mpi_comm_world>, module=0x12aca70) at base/coll_base_allgatherv.c:472
#7 0x00002b7a5c043e8b in PMPI_Allgatherv (sendbuf=0x1000, sendcount=1649523360, sendtype=0x9660f50, recvbuf=0xe, recvcounts=0x3640,
displs=0x0, recvtype=0x0, comm=0x7ffeb5f0e3d0) at pallgatherv.c:143
#8 0x0000000000459e55 in force_update_hmax () at gravity/forcetree_update.c:488
#9 0x0000000000405a5e in compute_hydro_densities_and_forces () at accel.c:81
#10 0x000000000041d4ff in run () at run.c:115
#11 0x00000000004058ed in main (argc=2, argv=0x7ffeb5f0e3d8) at main.c:110
(gdb) continue
Continuing.

Breakpoint 3, 0x00002b7a5c9bb4b0 in sbrk () from /usr/lib64/libc.so.6
(gdb) backtrace
#0 0x00002b7a5c9bb4b0 in sbrk () from /usr/lib64/libc.so.6
#1 0x00002b7a62310613 in ucm_event_call_orig () from /lib64/libucm.so.0
#2 0x00002b7a62310472 in ucm_event_dispatch () from /lib64/libucm.so.0
#3 0x00002b7a62310aec in ucm_sbrk () from /lib64/libucm.so.0
#4 0x00002b7a62316022 in ucm_dlmalloc () from /lib64/libucm.so.0
#5 0x00002b7a623113f7 in ucm_malloc_impl.isra.8 () from /lib64/libucm.so.0
#6 0x00002b7a5fce491a in opal_datatype_optimize_short (pData=, count=, pTypeDesc=)
at opal_datatype_optimize.c:64
#7 opal_datatype_commit (pData=0x1000) at opal_datatype_optimize.c:295
#8 0x00002b7a5c0921ef in ompi_datatype_commit (type=) at ../../../ompi/datatype/ompi_datatype.h:158
#9 ompi_coll_base_allgatherv_intra_neighborexchange (sbuf=0x1000, scount=-1242507872, sdtype=0xffffffffffff8510, rbuf=0xe, rcounts=0x3640,
rdispls=0x0, rdtype=0x76f500 <ompi_mpi_byte>, comm=0x76f700 <ompi_mpi_comm_world>, module=0x12aca70) at base/coll_base_allgatherv.c:472
#10 0x00002b7a5c043e8b in PMPI_Allgatherv (sendbuf=0x1000, sendcount=-1242507872, sendtype=0xffffffffffff8510, recvbuf=0xe, recvcounts=0x3640,
displs=0x0, recvtype=0x0, comm=0x7ffeb5f0e3d0) at pallgatherv.c:143
#11 0x0000000000459e55 in force_update_hmax () at gravity/forcetree_update.c:488
#12 0x0000000000405a5e in compute_hydro_densities_and_forces () at accel.c:81
#13 0x000000000041d4ff in run () at run.c:115
#14 0x00000000004058ed in main (argc=2, argv=0x7ffeb5f0e3d8) at main.c:110
(gdb) continue
Continuing.

Breakpoint 2, 0x00002b7a5c9bb440 in brk () from /usr/lib64/libc.so.6
(gdb) backtrace
#0 0x00002b7a5c9bb440 in brk () from /usr/lib64/libc.so.6
#1 0x00002b7a5c9bb4ff in sbrk () from /usr/lib64/libc.so.6
#2 0x00002b7a62310613 in ucm_event_call_orig () from /lib64/libucm.so.0
#3 0x00002b7a62310472 in ucm_event_dispatch () from /lib64/libucm.so.0
#4 0x00002b7a62310aec in ucm_sbrk () from /lib64/libucm.so.0
#5 0x00002b7a62316022 in ucm_dlmalloc () from /lib64/libucm.so.0
#6 0x00002b7a623113f7 in ucm_malloc_impl.isra.8 () from /lib64/libucm.so.0
#7 0x00002b7a5fce491a in opal_datatype_optimize_short (pData=, count=, pTypeDesc=)
at opal_datatype_optimize.c:64
#8 opal_datatype_commit (pData=0x0) at opal_datatype_optimize.c:295
#9 0x00002b7a5c0921ef in ompi_datatype_commit (type=) at ../../../ompi/datatype/ompi_datatype.h:158
#10 ompi_coll_base_allgatherv_intra_neighborexchange (sbuf=0x0, scount=-1242507872, sdtype=0xffffffffffff8510, rbuf=0xe, rcounts=0x3640,
rdispls=0x0, rdtype=0x76f500 <ompi_mpi_byte>, comm=0x76f700 <ompi_mpi_comm_world>, module=0x12aca70) at base/coll_base_allgatherv.c:472
#11 0x00002b7a5c043e8b in PMPI_Allgatherv (sendbuf=0x0, sendcount=-1242507872, sendtype=0xffffffffffff8510, recvbuf=0xe, recvcounts=0x3640,
displs=0x0, recvtype=0x0, comm=0x7ffeb5f0e3d0) at pallgatherv.c:143
#12 0x0000000000459e55 in force_update_hmax () at gravity/forcetree_update.c:488
#13 0x0000000000405a5e in compute_hydro_densities_and_forces () at accel.c:81
#14 0x000000000041d4ff in run () at run.c:115
#15 0x00000000004058ed in main (argc=2, argv=0x7ffeb5f0e3d8) at main.c:110

@yosefe
Copy link
Contributor Author

yosefe commented Oct 8, 2018

@ca-taylor thanks a lot! we need to release datatype cache in OMPI.. will update when have a fix

@ca-taylor
Copy link

@ca-taylor thanks a lot! we need to release datatype cache in OMPI.. will update when have a fix

Ok. I have a git-clone of the repo so I can pull and test whenever you are ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants