UCP_REQUEST_FLAG_RNDV_FRAG
assertion failure with endpoint error handling
#8639
Labels
UCP_REQUEST_FLAG_RNDV_FRAG
assertion failure with endpoint error handling
#8639
Describe the bug
An assertion failure occurs when testing for
UCP_REQUEST_FLAG_RNDV_FRAG
, provided that the endpoints have error handling andcuda_ipc
enabled, but nocuda_ipc
interconnect exists between the devices. This is problematic on systems like the DGX-1 where a heterogenous topology exists and disablingcuda_ipc
is not an option due to performance.Complete output
Steps to Reproduce
CUDA_VISIBLE_DEVICES=0,5 UCX_RNDV_FRAG_MEM_TYPE=cuda UCX_TLS=tcp,cuda_copy,cuda_ipc ucx_perftest -t tag_bw -m cuda -s 1000000 -e
CUDA_VISIBLE_DEVICES=0,5 UCX_RNDV_FRAG_MEM_TYPE=cuda UCX_TLS=tcp,cuda_copy,cuda_ipc ucx_perftest -t tag_bw -m cuda -s 1000000 -e localhost
Setup and versions
gdrcopy
supportnv_peer_mem
module loadedThe text was updated successfully, but these errors were encountered: