-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCX does not include InfiniBand when building with NVHPC compilers #8397
Comments
hi @mcuma could you provide complete output from configure script, config.h and config.log files from UCX built over NVHPC environment? is it possible to build debug version of UCX (with enabled logging) over NVHPC environment and provide output from command thank you |
FWIW I've seen the nvidia/pgi compilers "intrude" on the environment. When the module was loaded, even |
Hi @hoopoepg , thanks for your reply and the debug suggestion. I am not sure how to well attach files to this ticket so let me put them to a public link, https://home.chpc.utah.edu/~mcuma/debug/ucx/ In particular, the spack_build directory contains the requested files from the Spack build, debug_build contains the configure and ucx_info output from a build with --enable-debug option, and nodebug_build the same output with the build w/o debug. Interestingly, the build with --enable-debug includes the verbs provider, while the build without debug does not. The difference seems to be the optimization flag. The --enable-debug forces -O0, while the non-debug uses -O3. If one adds --enable-compiler-opt=1 or 1 configure option to force the -O1 or -O0, it also results in a correct IB build. Would you like me to open a ticket with the NVHPC group to address this? Though, even with the --enable-compiler-opt=1 , the OpenMPI built with UCX is complaining when running: It seems like that device is still not set up correctly. Any thoughts on this are appreciated. Thanks. |
as I can see from logs all builds successfully detected verbs library and enabled basic IB support. could you run command
yes, it could be reason, but as I can see from your bug report (ibv_devinfo) - devices are configured properly. |
Hi Sergey, the ldd seems to get the IB libraries correctly: The IB drivers should be OK, we have no issue with UCX built with GNU and Intel compilers, and IB works fine with MVAPICH2 and Intel MPI as well. Here's the ulimit -a output, anything suspicious there? We did raise a few limits in the past to accommodate MPI buffers, etc. I am wondering if you had reports like this in the past or if you have a platform where you could try to reproduce what we're doing? It seems not too many sites are using NVHPC with MPI/IB so I don' t have any contacts to cross-check this with. Also, the OpenMPI supplied with the NVHPC suite does not have the IB support built in, and neither does UCX that's shipped with the Rocky Linux 8 that we run - at least the "ucx_info -d" does not show it. Thanks. |
One more tidbit that I found. On our older cluster that still has CentOS 7, UCX builds correctly with NVHPC (= includes the verbs providers), but, OpenMPI still crashes with the "No resources available" error. I ended up with a workaround to build UCX with the stock OS gcc and then building OpenMPI with this UCX which results in correct OpenMPI behavior. So, I am good for now but still wondering if the issue we have is local due to our OS setup, or due to lack of other sites building UCX with NVHPC and verbs. |
hi thank you for your help |
Describe the bug
We are building OpenMPI with UCX as fabric using Spack package manager on our clusters that have various generations of Mellanox InfiniBand. We specify the verbs and mlx5 providers. When building with GNU or Intel compilers, ucx_info correctly shows these, but with the NVHPC it does not. I have verified in the Spack build logs that the libibverbs and libmlx5 are being found by configure and linked in both the GNU/Intel and NVHPC builds, which makes me perplexed as to why the ucx_info is not reporting these as active in the NVHPC build.
OpenMPI which is built atop of this UCX, in the NVHPC build case, runs TCP over IB resulting in ~20 us latencies, while the gcc/intel build runs over IB with ~1.7 us latencies.
Any thoughts on this would be appreciated. I have also did the same build on our older CentOS 7 system and the issue is present there too. So, I don't think it's OS/driver stack related.
I'll be happy to provide more info but first I'd be curious to hear if you had similar reports in the past, or, if someone has successfully built UCX with NVHPC for IB.
Steps to Reproduce
Spack command:
spack install [email protected]%[email protected]~pmi target=nehalem fabrics=ucx +internal-hwloc+thread_multiple schedulers=slurm +legacylaunchers ^ucx +mlx5-dv+verbs+cm+ud+dc+rc+cma ^[email protected] ^[email protected]
For the GNU (on Rocky Linux 8), replace [email protected] with [email protected])
This results in the following configure arguments:
--with-verbs=/usr --disable-mt --enable-cma --disable-params-check --without-avx --enable-optimizations --disable-assertions --disable-logging --with-pic --with-rc --with-ud --with-dc --with-mlx5-dv --without-ib-hw-tm --without-dm --with-cm --without-rocm --without-java --without-cuda --without-gdrcopy --without-knem --without-xpmem
NVHPC build:
$ ucx_info -v
UCT version=1.11.2 revision ef2bbcf
configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --prefix=/uufs/chpc.utah.edu/sys/spack/linux-rocky8-nehalem/nvhpc-21.5/ucx-1.11.2-asrhvd26hyucdhokcp6l5ufukmgxync7 --with-verbs=/usr --enable-mt --enable-cma --disable-params-check --without-avx --enable-optimizations --disable-assertions --disable-logging --with-pic --with-rc --with-ud --with-dc --with-mlx5-dv --without-ib-hw-tm --without-dm --with-cm --without-rocm --without-java --without-cuda --without-gdrcopy --without-knem --without-xpmem
GCC build:
$ucx_info -v
UCT version=1.11.2 revision ef2bbcf
configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --prefix=/uufs/chpc.utah.edu/sys/spack/linux-rocky8-nehalem/gcc-8.5.0/ucx-1.11.2-etdhrh4gzoj2nroomy7bipr2p2e3ly4l --with-verbs=/usr --enable-mt --enable-cma --disable-params-check --without-avx --enable-optimizations --disable-assertions --disable-logging --with-pic --with-rc --with-ud --with-dc --with-mlx5-dv --without-ib-hw-tm --without-dm --with-cm --without-rocm --without-java --without-cuda --without-gdrcopy --without-knem --without-xpmem
$ /uufs/chpc.utah.edu/sys/spack/linux-rocky8-nehalem/nvhpc-21.7/ucx-1.11.2-jvwucdhwoqpn2xsttr55wgb5kzzbo32v/bin/ucx_info -d | grep verbs
(nothing)
$ /uufs/chpc.utah.edu/sys/spack/linux-rocky8-nehalem/gcc-8.5.0/ucx-1.11.2-ujc57b4cyrldztpxujdb7v3kaaww54tt/bin/ucx_info -d | grep verbs
Transport: rc_verbs
Transport: ud_verbs
$ /uufs/chpc.utah.edu/sys/spack/linux-rocky8-nehalem/nvhpc-21.7/ucx-1.11.2-jvwucdhwoqpn2xsttr55wgb5kzzbo32v/bin/ucx_info -d | grep mlx
< failed to open memory domain mlx4_0 >
$ /uufs/chpc.utah.edu/sys/spack/linux-rocky8-nehalem/gcc-8.5.0/ucx-1.11.2-ujc57b4cyrldztpxujdb7v3kaaww54tt/bin/ucx_info -d | grep mlx
Memory domain: mlx4_0
Device: mlx4_0:1
Device: mlx4_0:1
Setup and versions
$ cat /etc/os-release
NAME="Rocky Linux"
VERSION="8.5 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.5"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.5 (Green Obsidian)"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
ROCKY_SUPPORT_PRODUCT="Rocky Linux"
ROCKY_SUPPORT_PRODUCT_VERSION="8"
$ uname -a
Linux notchpeak2 4.18.0-348.20.1.el8_5.x86_64 #1 SMP Thu Mar 10 20:59:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ rpm -q rdma-core
rdma-core-35.0-1.el8.x86_64
$ rpm -q libibverbs
libibverbs-35.0-1.el8.x86_64
$ ibv_devinfo -vv
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.42.5000
node_guid: 0002:c903:00a4:faa0
sys_image_guid: 0002:c903:00a4:faa3
vendor_id: 0x02c9
vendor_part_id: 4099
hw_ver: 0x1
board_id: MT_1090120019
phys_port_cnt: 2
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffe00
max_qp: 131000
max_qp_wr: 16351
device_cap_flags: 0x057e9c76
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
UD_AV_PORT_ENFORCE
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
MEM_WINDOW
UD_IP_CSUM
XRC
MEM_MGT_EXTENSIONS
MEM_WINDOW_TYPE_2B
RAW_IP_CSUM
Unknown flags: 0x488000
max_sge: 32
max_sge_rd: 30
max_cq: 65408
max_cqe: 4194303
max_mr: 524032
max_pd: 32764
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 2096000
max_qp_init_rd_atom: 128
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 0
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 8192
max_mcast_qp_attach: 248
max_total_mcast_qp_attach: 2031616
max_ah: 2147483647
max_fmr: 0
max_srq: 65472
max_srq_wr: 16383
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 15
general_odp_caps:
rc_odp_caps:
NO SUPPORT
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
NO SUPPORT
xrc_odp_caps:
NO SUPPORT
completion timestamp_mask: 0x0000ffffffffffff
hca_core_clock: 427000kHZ
device_cap_flags_ex: 0x57E9C76
tso_caps:
max_tso: 0
rss_caps:
max_rwq_indirection_tables: 0
max_rwq_indirection_table_size: 0
rx_hash_function: 0x0
rx_hash_fields_mask: 0x0
max_wq_type_rq: 0
packet_pacing_caps:
qp_rate_limit_min: 0kbps
qp_rate_limit_max: 0kbps
tag matching not supported
Additional information (depending on the issue)
$ /uufs/chpc.utah.edu/sys/spack/linux-rocky8-nehalem/nvhpc-21.7/ucx-1.11.2-jvwucdhwoqpn2xsttr55wgb5kzzbo32v/bin/ucx_info -d
Memory domain: posix
Component: posix
allocate: unlimited
remote key: 24 bytes
rkey_ptr is supported
Transport: posix
Device: memory
System device:
capabilities:
bandwidth: 0.00/ppn + 12179.00 MB/sec
latency: 80 nsec
overhead: 10 nsec
put_short: <= 4294967295
put_bcopy: unlimited
get_bcopy: unlimited
am_short: <= 100
am_bcopy: <= 8256
domain: cpu
atomic_add: 32, 64 bit
atomic_and: 32, 64 bit
atomic_or: 32, 64 bit
atomic_xor: 32, 64 bit
atomic_fadd: 32, 64 bit
atomic_fand: 32, 64 bit
atomic_for: 32, 64 bit
atomic_fxor: 32, 64 bit
atomic_swap: 32, 64 bit
atomic_cswap: 32, 64 bit
connection: to iface
device priority: 0
device num paths: 1
max eps: inf
device address: 8 bytes
iface address: 8 bytes
error handling: ep_check
Memory domain: sysv
Component: sysv
allocate: unlimited
remote key: 12 bytes
rkey_ptr is supported
Transport: sysv
Device: memory
System device:
capabilities:
bandwidth: 0.00/ppn + 12179.00 MB/sec
latency: 80 nsec
overhead: 10 nsec
put_short: <= 4294967295
put_bcopy: unlimited
get_bcopy: unlimited
am_short: <= 100
am_bcopy: <= 8256
domain: cpu
atomic_add: 32, 64 bit
atomic_and: 32, 64 bit
atomic_or: 32, 64 bit
atomic_xor: 32, 64 bit
atomic_fadd: 32, 64 bit
atomic_fand: 32, 64 bit
atomic_for: 32, 64 bit
atomic_fxor: 32, 64 bit
atomic_swap: 32, 64 bit
atomic_cswap: 32, 64 bit
connection: to iface
device priority: 0
device num paths: 1
max eps: inf
device address: 8 bytes
iface address: 8 bytes
error handling: ep_check
Memory domain: self
Component: self
register: unlimited, cost: 0 nsec
remote key: 0 bytes
Transport: self
Device: memory0
System device:
capabilities:
bandwidth: 0.00/ppn + 6911.00 MB/sec
latency: 0 nsec
overhead: 10 nsec
put_short: <= 4294967295
put_bcopy: unlimited
get_bcopy: unlimited
am_short: <= 8K
am_bcopy: <= 8K
domain: cpu
atomic_add: 32, 64 bit
atomic_and: 32, 64 bit
atomic_or: 32, 64 bit
atomic_xor: 32, 64 bit
atomic_fadd: 32, 64 bit
atomic_fand: 32, 64 bit
atomic_for: 32, 64 bit
atomic_fxor: 32, 64 bit
atomic_swap: 32, 64 bit
atomic_cswap: 32, 64 bit
connection: to iface
device priority: 0
device num paths: 1
max eps: inf
device address: 0 bytes
iface address: 8 bytes
error handling: ep_check
Memory domain: tcp
Component: tcp
register: unlimited, cost: 0 nsec
remote key: 0 bytes
Transport: tcp
Device: eth0
System device:
capabilities:
bandwidth: 113.16/ppn + 0.00 MB/sec
latency: 5776 nsec
overhead: 50000 nsec
put_zcopy: <= 18446744073709551590, up to 6 iov
put_opt_zcopy_align: <= 1
put_align_mtu: <= 0
am_short: <= 8K
am_bcopy: <= 8K
am_zcopy: <= 64K, up to 6 iov
am_opt_zcopy_align: <= 1
am_align_mtu: <= 0
am header: <= 8037
connection: to ep, to iface
device priority: 1
device num paths: 1
max eps: 256
device address: 6 bytes
iface address: 2 bytes
ep address: 10 bytes
error handling: peer failure, ep_check, keepalive
Transport: tcp
Device: lo
System device:
capabilities:
bandwidth: 11.91/ppn + 0.00 MB/sec
latency: 10960 nsec
overhead: 50000 nsec
put_zcopy: <= 18446744073709551590, up to 6 iov
put_opt_zcopy_align: <= 1
put_align_mtu: <= 0
am_short: <= 8K
am_bcopy: <= 8K
am_zcopy: <= 64K, up to 6 iov
am_opt_zcopy_align: <= 1
am_align_mtu: <= 0
am header: <= 8037
connection: to ep, to iface
device priority: 1
device num paths: 1
max eps: 256
device address: 18 bytes
iface address: 2 bytes
ep address: 10 bytes
error handling: peer failure, ep_check, keepalive
Transport: tcp
Device: eth0.26
System device:
capabilities:
bandwidth: 113.16/ppn + 0.00 MB/sec
latency: 5776 nsec
overhead: 50000 nsec
put_zcopy: <= 18446744073709551590, up to 6 iov
put_opt_zcopy_align: <= 1
put_align_mtu: <= 0
am_short: <= 8K
am_bcopy: <= 8K
am_zcopy: <= 64K, up to 6 iov
am_opt_zcopy_align: <= 1
am_align_mtu: <= 0
am header: <= 8037
connection: to ep, to iface
device priority: 0
device num paths: 1
max eps: 256
device address: 6 bytes
iface address: 2 bytes
ep address: 10 bytes
error handling: peer failure, ep_check, keepalive
Transport: tcp
Device: ib0
System device:
capabilities:
bandwidth: 6239.81/ppn + 0.00 MB/sec
latency: 5210 nsec
overhead: 50000 nsec
put_zcopy: <= 18446744073709551590, up to 6 iov
put_opt_zcopy_align: <= 1
put_align_mtu: <= 0
am_short: <= 8K
am_bcopy: <= 8K
am_zcopy: <= 64K, up to 6 iov
am_opt_zcopy_align: <= 1
am_align_mtu: <= 0
am header: <= 8037
connection: to ep, to iface
device priority: 1
device num paths: 1
max eps: 256
device address: 6 bytes
iface address: 2 bytes
ep address: 10 bytes
error handling: peer failure, ep_check, keepalive
Connection manager: tcp
max_conn_priv: 2064 bytes
< failed to open memory domain mlx4_0 >
Connection manager: rdmacm
max_conn_priv: 54 bytes
Memory domain: cma
Component: cma
register: unlimited, cost: 9 nsec
Transport: cma
Device: memory
System device:
capabilities:
bandwidth: 0.00/ppn + 11145.00 MB/sec
latency: 80 nsec
overhead: 400 nsec
put_zcopy: unlimited, up to 16 iov
put_opt_zcopy_align: <= 1
put_align_mtu: <= 1
get_zcopy: unlimited, up to 16 iov
get_opt_zcopy_align: <= 1
get_align_mtu: <= 1
connection: to iface
device priority: 0
device num paths: 1
max eps: inf
device address: 8 bytes
iface address: 4 bytes
error handling: peer failure, ep_check
$ /uufs/chpc.utah.edu/sys/spack/linux-rocky8-nehalem/gcc-8.5.0/ucx-1.11.2-ujc57b4cyrldztpxujdb7v3kaaww54tt/bin/ucx_info -d
Memory domain: posix
Component: posix
allocate: unlimited
remote key: 24 bytes
rkey_ptr is supported
Transport: posix
Device: memory
System device:
capabilities:
bandwidth: 0.00/ppn + 12179.00 MB/sec
latency: 80 nsec
overhead: 10 nsec
put_short: <= 4294967295
put_bcopy: unlimited
get_bcopy: unlimited
am_short: <= 100
am_bcopy: <= 8256
domain: cpu
atomic_add: 32, 64 bit
atomic_and: 32, 64 bit
atomic_or: 32, 64 bit
atomic_xor: 32, 64 bit
atomic_fadd: 32, 64 bit
atomic_fand: 32, 64 bit
atomic_for: 32, 64 bit
atomic_fxor: 32, 64 bit
atomic_swap: 32, 64 bit
atomic_cswap: 32, 64 bit
connection: to iface
device priority: 0
device num paths: 1
max eps: inf
device address: 8 bytes
iface address: 8 bytes
error handling: ep_check
Memory domain: sysv
Component: sysv
allocate: unlimited
remote key: 12 bytes
rkey_ptr is supported
Transport: sysv
Device: memory
System device:
capabilities:
bandwidth: 0.00/ppn + 12179.00 MB/sec
latency: 80 nsec
overhead: 10 nsec
put_short: <= 4294967295
put_bcopy: unlimited
get_bcopy: unlimited
am_short: <= 100
am_bcopy: <= 8256
domain: cpu
atomic_add: 32, 64 bit
atomic_and: 32, 64 bit
atomic_or: 32, 64 bit
atomic_xor: 32, 64 bit
atomic_fadd: 32, 64 bit
atomic_fand: 32, 64 bit
atomic_for: 32, 64 bit
atomic_fxor: 32, 64 bit
atomic_swap: 32, 64 bit
atomic_cswap: 32, 64 bit
connection: to iface
device priority: 0
device num paths: 1
max eps: inf
device address: 8 bytes
iface address: 8 bytes
error handling: ep_check
Memory domain: self
Component: self
register: unlimited, cost: 0 nsec
remote key: 0 bytes
Transport: self
Device: memory0
System device:
capabilities:
bandwidth: 0.00/ppn + 6911.00 MB/sec
latency: 0 nsec
overhead: 10 nsec
put_short: <= 4294967295
put_bcopy: unlimited
get_bcopy: unlimited
am_short: <= 8K
am_bcopy: <= 8K
domain: cpu
atomic_add: 32, 64 bit
atomic_and: 32, 64 bit
atomic_or: 32, 64 bit
atomic_xor: 32, 64 bit
atomic_fadd: 32, 64 bit
atomic_fand: 32, 64 bit
atomic_for: 32, 64 bit
atomic_fxor: 32, 64 bit
atomic_swap: 32, 64 bit
atomic_cswap: 32, 64 bit
connection: to iface
device priority: 0
device num paths: 1
max eps: inf
device address: 0 bytes
iface address: 8 bytes
error handling: ep_check
Memory domain: tcp
Component: tcp
register: unlimited, cost: 0 nsec
remote key: 0 bytes
Transport: tcp
Device: eth0
System device:
capabilities:
bandwidth: 113.16/ppn + 0.00 MB/sec
latency: 5776 nsec
overhead: 50000 nsec
put_zcopy: <= 18446744073709551590, up to 6 iov
put_opt_zcopy_align: <= 1
put_align_mtu: <= 0
am_short: <= 8K
am_bcopy: <= 8K
am_zcopy: <= 64K, up to 6 iov
am_opt_zcopy_align: <= 1
am_align_mtu: <= 0
am header: <= 8037
connection: to ep, to iface
device priority: 1
device num paths: 1
max eps: 256
device address: 6 bytes
iface address: 2 bytes
ep address: 10 bytes
error handling: peer failure, ep_check, keepalive
Transport: tcp
Device: lo
System device:
capabilities:
bandwidth: 11.91/ppn + 0.00 MB/sec
latency: 10960 nsec
overhead: 50000 nsec
put_zcopy: <= 18446744073709551590, up to 6 iov
put_opt_zcopy_align: <= 1
put_align_mtu: <= 0
am_short: <= 8K
am_bcopy: <= 8K
am_zcopy: <= 64K, up to 6 iov
am_opt_zcopy_align: <= 1
am_align_mtu: <= 0
am header: <= 8037
connection: to ep, to iface
device priority: 1
device num paths: 1
max eps: 256
device address: 18 bytes
iface address: 2 bytes
ep address: 10 bytes
error handling: peer failure, ep_check, keepalive
Transport: tcp
Device: eth0.26
System device:
capabilities:
bandwidth: 113.16/ppn + 0.00 MB/sec
latency: 5776 nsec
overhead: 50000 nsec
put_zcopy: <= 18446744073709551590, up to 6 iov
put_opt_zcopy_align: <= 1
put_align_mtu: <= 0
am_short: <= 8K
am_bcopy: <= 8K
am_zcopy: <= 64K, up to 6 iov
am_opt_zcopy_align: <= 1
am_align_mtu: <= 0
am header: <= 8037
connection: to ep, to iface
device priority: 0
device num paths: 1
max eps: 256
device address: 6 bytes
iface address: 2 bytes
ep address: 10 bytes
error handling: peer failure, ep_check, keepalive
Transport: tcp
Device: ib0
System device:
capabilities:
bandwidth: 6239.81/ppn + 0.00 MB/sec
latency: 5210 nsec
overhead: 50000 nsec
put_zcopy: <= 18446744073709551590, up to 6 iov
put_opt_zcopy_align: <= 1
put_align_mtu: <= 0
am_short: <= 8K
am_bcopy: <= 8K
am_zcopy: <= 64K, up to 6 iov
am_opt_zcopy_align: <= 1
am_align_mtu: <= 0
am header: <= 8037
connection: to ep, to iface
device priority: 1
device num paths: 1
max eps: 256
device address: 6 bytes
iface address: 2 bytes
ep address: 10 bytes
error handling: peer failure, ep_check, keepalive
Connection manager: tcp
max_conn_priv: 2064 bytes
Memory domain: mlx4_0
Component: ib
register: unlimited, cost: 180 nsec
remote key: 8 bytes
local memory handle is required for zcopy
Transport: rc_verbs
Device: mlx4_0:1
System device: 0000:42:00.0 (0)
capabilities:
bandwidth: 6433.22/ppn + 0.00 MB/sec
latency: 700 + 1.000 * N nsec
overhead: 75 nsec
put_short: <= 88
put_bcopy: <= 8256
put_zcopy: <= 1G, up to 6 iov
put_opt_zcopy_align: <= 512
put_align_mtu: <= 2K
get_bcopy: <= 8256
get_zcopy: 65..1G, up to 6 iov
get_opt_zcopy_align: <= 512
get_align_mtu: <= 2K
am_short: <= 87
am_bcopy: <= 8255
am_zcopy: <= 8255, up to 5 iov
am_opt_zcopy_align: <= 512
am_align_mtu: <= 2K
am header: <= 127
domain: device
atomic_add: 64 bit
atomic_fadd: 64 bit
atomic_cswap: 64 bit
connection: to ep
device priority: 10
device num paths: 1
max eps: 256
device address: 4 bytes
ep address: 4 bytes
error handling: peer failure, ep_check
Transport: ud_verbs
Device: mlx4_0:1
System device: 0000:42:00.0 (0)
capabilities:
bandwidth: 6433.22/ppn + 0.00 MB/sec
latency: 730 nsec
overhead: 105 nsec
am_short: <= 172
am_bcopy: <= 4088
am_zcopy: <= 4088, up to 8 iov
am_opt_zcopy_align: <= 512
am_align_mtu: <= 4K
am header: <= 3952
connection: to ep, to iface
device priority: 10
device num paths: 1
max eps: inf
device address: 4 bytes
iface address: 3 bytes
ep address: 6 bytes
error handling: peer failure, ep_check
Connection manager: rdmacm
max_conn_priv: 54 bytes
Memory domain: cma
Component: cma
register: unlimited, cost: 9 nsec
Transport: cma
Device: memory
System device:
capabilities:
bandwidth: 0.00/ppn + 11145.00 MB/sec
latency: 80 nsec
overhead: 400 nsec
put_zcopy: unlimited, up to 16 iov
put_opt_zcopy_align: <= 1
put_align_mtu: <= 1
get_zcopy: unlimited, up to 16 iov
get_opt_zcopy_align: <= 1
get_align_mtu: <= 1
connection: to iface
device priority: 0
device num paths: 1
max eps: inf
device address: 8 bytes
iface address: 4 bytes
error handling: peer failure, ep_check
Configure outputs with excerpts for mlx and verbs:
NVHPC:
$ grep mlx spack-build-02-configure-out.txt
checking infiniband/mlx5_hw.h usability... no
checking infiniband/mlx5_hw.h presence... no
checking for infiniband/mlx5_hw.h... no
checking for mlx5dv_query_device in -lmlx5-rdmav2... no
checking for mlx5dv_query_device in -lmlx5... yes
checking for infiniband/mlx5dv.h... yes
checking whether mlx5dv_init_obj is declared... yes
checking whether mlx5dv_create_qp is declared... yes
checking whether mlx5dv_is_supported is declared... yes
checking whether mlx5dv_devx_subscribe_devx_event is declared... yes
checking for struct mlx5dv_cq.cq_uar... yes
configure: Compiling with mlx5 bare-metal support
checking for struct mlx5_wqe_av.base... no
checking for struct mlx5_grh_av.rmac... no
checking for struct mlx5_cqe64.ib_stride_index... no
$ grep verbs spack-build-02-configure-out.txt
configure: Compiling with verbs support from /usr
checking infiniband/verbs.h usability... yes
checking infiniband/verbs.h presence... yes
checking for infiniband/verbs.h... yes
checking for ibv_get_device_list in -libverbs... yes
checking infiniband/verbs_exp.h usability... no
checking infiniband/verbs_exp.h presence... no
checking for infiniband/verbs_exp.h... no
GCC:
$ grep mlx spack-build-02-configure-out.txt
checking infiniband/mlx5_hw.h usability... no
checking infiniband/mlx5_hw.h presence... no
checking for infiniband/mlx5_hw.h... no
checking for mlx5dv_query_device in -lmlx5-rdmav2... no
checking for mlx5dv_query_device in -lmlx5... yes
checking for infiniband/mlx5dv.h... yes
checking whether mlx5dv_init_obj is declared... yes
checking whether mlx5dv_create_qp is declared... yes
checking whether mlx5dv_is_supported is declared... yes
checking whether mlx5dv_devx_subscribe_devx_event is declared... yes
checking for struct mlx5dv_cq.cq_uar... yes
configure: Compiling with mlx5 bare-metal support
checking for struct mlx5_wqe_av.base... no
checking for struct mlx5_grh_av.rmac... no
checking for struct mlx5_cqe64.ib_stride_index... no
$ grep verbs spack-build-02-configure-out.txt
configure: Compiling with verbs support from /usr
checking infiniband/verbs.h usability... yes
checking infiniband/verbs.h presence... yes
checking for infiniband/verbs.h... yes
checking for ibv_get_device_list in -libverbs... yes
checking infiniband/verbs_exp.h usability... no
checking infiniband/verbs_exp.h presence... no
checking for infiniband/verbs_exp.h... no
The text was updated successfully, but these errors were encountered: