Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot use ibv_exp_reg_mr why? #4309

Closed
ivyJchen opened this issue Oct 18, 2019 · 2 comments
Closed

cannot use ibv_exp_reg_mr why? #4309

ivyJchen opened this issue Oct 18, 2019 · 2 comments

Comments

@ivyJchen
Copy link

Hi,
We build OpenMPI3.1.1 with openucx1.6。I want use ibv_exp_reg_mr instead OF ibv_reg_mr 。
so ADD “-DHAVE_DECL_IBV_EXP_REG_MR=1 -DHAVE_DECL_IBV_EXP_ODP_SUPPORT_IMPLICIT=1” to CCASFLAGS 、CFLAGS、CPPFLAGS in Makefile。

run the comman :
set :export UCX_IB_REG_METHODS=odp
mpirun --allow-run-as-root -np 2 --mca pml ucx -x UCX_TLS=rc_x osu_bw

first we got what we expected:
#0 ibv_exp_reg_mr (in=0xfffffeebfed8) at /usr/include/infiniband/verbs_exp.h:3590
#1 0x0000ffff80d106f4 in uct_ib_md_parse_reg_methods (md=0x1e72c620, md_attr=0xfffffeebff88, md_config=0x1e653720) at base/ib_md.c:1517
#2 0x0000ffff80d11a5c in uct_ib_md_open (md_name=0x1e6536b0 "ib/mlx5_0", uct_md_config=0x1e653720, md_p=0xfffffeec00f8) at base/ib_md.c:1853
#3 0x0000ffff80efa67c in uct_md_open (md_name=0x1e6536b0 "ib/mlx5_0", config=0x1e653720, md_p=0x1e72acd0) at base/uct_md.c:129
#4 0x0000ffff80f4dcb8 in ucp_fill_tl_md (md_rsc=0x1e6536b0, tl_md=0x1e72acd0) at core/ucp_context.c:754
#5 0x0000ffff80f4e478 in ucp_fill_resources (context=0x1e654f30, config=0x1e5abc60) at core/ucp_context.c:936
#6 0x0000ffff80f4f1ec in ucp_init_version (api_major_version=1, api_minor_version=6, params=0xfffffeec0748, config=0x1e5abc60, context_p=0xffff808a03c8 <mca_osc_ucx_component+296>)
at core/ucp_context.c:1223
#7 0x0000ffff80877fa8 in ucp_init (params=0xfffffeec0748, config=0x1e5abc60, context_p=0xffff808a03c8 <mca_osc_ucx_component+296>)
at /ssd/centos7.6/openmpi_gcc7.3.0/ucx-1.6.x.debug.with.knem.xpmem/include/ucp/api/ucp.h:1146
#8 0x0000ffff80878378 in component_init (enable_progress_threads=false, enable_mpi_threads=false) at osc_ucx_component.c:172
#9 0x0000ffff8499e034 in ompi_osc_base_find_available (enable_progress_threads=false, enable_mpi_threads=false) at base/osc_base_frame.c:50
#10 0x0000ffff848c28d8 in ompi_mpi_init (argc=1, argv=0xfffffeec0b58, requested=0, provided=0xfffffeec094c, reinit_ok=false) at runtime/ompi_mpi_init.c:711
#11 0x0000ffff8491d3d4 in PMPI_Init (argc=0xfffffeec09ec, argv=0xfffffeec09e0) at pinit.c:66
#12 0x000000000040173c in main (argc=, argv=) at osu_bw.c:39

second wo got not:
#0 0x0000ffff820ce108 in ibv_reg_mr () from /lib64/libibverbs.so.1
#1 0x0000ffff80d0dd94 in uct_ib_md_reg_mr (md=0x1e65aba0, address=0xffff79430000, length=37486592, exp_access=0, silent=0, mr_p=0x1e76d8a0) at base/ib_md.c:473
#2 0x0000ffff80d0f62c in uct_ib_mem_reg_internal (uct_md=0x1e65aba0, address=0xffff79430000, length=37486592, flags=228, silent=0, memh=0x1e76d890) at base/ib_md.c:1094
#3 0x0000ffff80d0f7bc in uct_ib_mem_reg (uct_md=0x1e65aba0, address=0xffff79430000, length=37486592, flags=228, memh_p=0xfffffeec0208) at base/ib_md.c:1126
#4 0x0000ffff80d0ffd8 in uct_ib_mem_global_odp_reg (uct_md=0x1e65aba0, address=0xffff79430000, length=37486592, flags=228, memh_p=0xfffffeec0208) at base/ib_md.c:1357
#5 0x0000ffff80efb874 in uct_md_mem_reg (md=0x1e65aba0, address=0xffff79430000, length=37486592, flags=228, memh_p=0xfffffeec0208) at base/uct_md.c:536
#6 0x0000ffff80efc5cc in uct_iface_mem_alloc (tl_iface=0x1e6d2c70, length=37481712, flags=228, name=0x1e6a4c90 "rc_recv_desc", mem=0xfffffeec01e8) at base/uct_mem.c:296
#7 0x0000ffff80efc758 in uct_iface_mp_chunk_alloc (mp=0x1e6d3218, size_p=0xfffffeec0278, chunk_p=0xfffffeec0270) at base/uct_mem.c:343
#8 0x0000ffff80eacf78 in ucs_mpool_grow (mp=0x1e6d3218, num_elems=4505) at datastruct/mpool.c:184
#9 0x0000ffff80ead298 in ucs_mpool_get_grow (mp=0x1e6d3218) at datastruct/mpool.c:232
#10 0x0000ffff80d7360c in ucs_mpool_get_inline (mp=0x1e6d3218) at /ssd/centos7.6/openmpi_gcc7.3.0/source/ucx-1.6.x.debug.with.knem.xpmem/src/ucs/datastruct/mpool.inl:23
#11 0x0000ffff80d73a74 in uct_rc_mlx5_iface_srq_post_recv (iface=0x1e6d2c70, srq=0x1e6db2c0) at rc/accel/rc_mlx5_common.c:77
#12 0x0000ffff80d73c88 in uct_rc_mlx5_iface_common_prepost_recvs (iface=0x1e6d2c70) at rc/accel/rc_mlx5_common.c:109
#13 0x0000ffff80d6b9f0 in uct_rc_mlx5_iface_progress_enable (tl_iface=0x1e6d2c70, flags=131) at rc/accel/rc_mlx5_iface.c:230
#14 0x0000ffff80f5fcbc in uct_iface_progress_enable (flags=131, iface=0x1e6d2c70) at /ssd/centos7.6/openmpi_gcc7.3.0/source/ucx-1.6.x.debug.with.knem.xpmem/src/uct/api/uct.h:2546
#15 ucp_worker_iface_activate (wiface=0x1e6d1fb0, uct_flags=128) at core/ucp_worker.c:588
#16 0x0000ffff80f5ffc4 in ucp_worker_iface_progress_ep (wiface=0x1e6d1fb0) at core/ucp_worker.c:629
#17 0x0000ffff80fb5534 in ucp_wireup_connect_lane (ep=0xffff80470000, params=0xfffffeec0758, lane=0 '\000', address_count=2, address_list=0x1e728de0, addr_index=0) at wireup/wireup.c:657
#18 0x0000ffff80fb5d10 in ucp_wireup_init_lanes (ep=0xffff80470000, params=0xfffffeec0758, ep_init_flags=0, address_count=2, address_list=0x1e728de0, addr_indices=0xfffffeec0638 "")
at wireup/wireup.c:800
#19 0x0000ffff80f50b24 in ucp_ep_create_to_worker_addr (worker=0x1e6b7240, params=0xfffffeec0758, remote_address=0xfffffeec0680, ep_init_flags=0, message=0xffff80fc26a0 "from api call",
ep_p=0xfffffeec0678) at core/ucp_ep.c:322
#20 0x0000ffff80f512ac in ucp_ep_create_api_to_worker_addr (worker=0x1e6b7240, params=0xfffffeec0758, ep_p=0xfffffeec0700) at core/ucp_ep.c:543
#21 0x0000ffff80f515b8 in ucp_ep_create (worker=0x1e6b7240, params=0xfffffeec0758, ep_p=0xfffffeec0740) at core/ucp_ep.c:602
#22 0x0000ffff810054f8 in mca_pml_ucx_add_proc_common (proc=0x1e565b00) at pml_ucx.c:291
#23 0x0000ffff8100566c in mca_pml_ucx_add_procs (procs=0x1e728d60, nprocs=1) at pml_ucx.c:335
#24 0x0000ffff848c2b58 in ompi_mpi_init (argc=1, argv=0xfffffeec0b58, requested=0, provided=0xfffffeec094c, reinit_ok=false) at runtime/ompi_mpi_init.c:825
#25 0x0000ffff8491d3d4 in PMPI_Init (argc=0xfffffeec09ec, argv=0xfffffeec09e0) at pinit.c:66
#26 0x000000000040173c in main (argc=, argv=) at osu_bw.c:39

the reason comes from function uct_iface_mp_chunk_alloc in uct_mem.c
when it calling uct_iface_mem_alloc with flags= UCT_MD_MEM_ACCESS_ALL | UCT_MD_MEM_FLAG_LOCK
so it causes getting result of exp_access=0 uct_ib_md_access_flags in ib_md.c
at final, in uct_ib_md_reg_mr, it willnot go to call ibv_exp_reg_mr.

could you help us with this issue?

@yosefe
Copy link
Contributor

yosefe commented Oct 22, 2019

@ivyJchen UCX v1.6 calls ibv_reg_mr and not ibv_exp_reg_mr if there are no "experimental" flags (e.g ODP)
see https://github.com/openucx/ucx/blob/v1.6.x/src/uct/ib/base/ib_md.c#L453
any reason that you need to use ibv_exp_reg_mr instead of ibv_reg_mr?

@ivyJchen
Copy link
Author

ivyJchen commented Nov 8, 2019

@yosefe thanks

@ivyJchen ivyJchen closed this as completed Nov 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants