[DOC] Point users to UCX 1.11.2 #3651
Labels
documentation
Improvements or additions to documentation
shuffle
things that impact the shuffle plugin
Milestone
UCX 1.11.2 is fixing two bugs we have seen:
When running on a machine that has a non-RDMA NIC (i.e. an ethernet-only device used as a the primary NIC for the executor), and a secondary RDMA NIC, we see failures since UCX would skip a step to register memory against the non-RDMA NIC and fail when attempting to use it later after realizing it exists: UCP/AM: Fix request datatype state during CM switch - v1.11.x openucx/ucx#7436.
Seen in the DGX-2, an error where when we have so many GPUs (16) and nics (8) and the various transports that UCX supports, that an internal datastructure in UCX could not hold all of this: UCP/WIREUP: Handle address count > 64 - v1.11.x openucx/ucx#7398
UCX 1.11.2 is in RC state, with the first RC out: https://github.com/openucx/ucx/releases/tag/v1.11.2-rc1. This issue is to change the docs to point to UCX 1.11.2, at least to the RC.
The text was updated successfully, but these errors were encountered: