Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/RNDV: Adjust max_frag to be at least of minimal RNDV chunk size #10407

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
6 changes: 6 additions & 0 deletions src/ucp/proto/proto_common.c
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,12 @@ ucp_proto_common_get_lane_perf(const ucp_proto_common_init_params_t *params,

ucp_proto_common_get_frag_size(params, &wiface->attr, lane, &tl_min_frag,
&tl_max_frag);
if (params->min_length > tl_max_frag) {
ucs_debug("protocol %s min_length %zu is larger than lane[%d] max_frag "
"%zu", ucp_proto_id_field(params->super.proto_id, name),
params->min_length, lane, tl_max_frag);
return UCS_ERR_OUT_OF_RANGE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UCS_ERR_INVALID_PARAM - params->min_length is invalid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

perf_node = ucp_proto_perf_node_new_data("lane", "%u ppn %u eps",
context->config.est_num_ppn,
Expand Down
18 changes: 14 additions & 4 deletions src/ucp/proto/proto_multi.c
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ ucs_status_t ucp_proto_multi_init(const ucp_proto_multi_init_params_t *params,
ucp_lane_index_t i, lane, num_lanes;
ucp_proto_multi_lane_priv_t *lpriv;
ucp_proto_perf_node_t *perf_node;
size_t max_frag, min_length, min_end_offset;
size_t max_frag, min_length, min_end_offset, min_rndv_chunk;
ucp_lane_map_t lane_map;
ucp_md_map_t reg_md_map;
uint32_t weight_sum;
Expand Down Expand Up @@ -167,6 +167,18 @@ ucs_status_t ucp_proto_multi_init(const ucp_proto_multi_init_params_t *params,
/* Make sure fragment is not zero */
ucs_assert(max_frag > 0);

min_rndv_chunk = lane_perf->bandwidth *
context->config.ext.min_rndv_chunk_size /
min_bandwidth;
/* min_rndv_chunk must operate within iface/HW limits */
min_rndv_chunk = ucs_max(ucs_min(min_rndv_chunk, lane_perf->max_frag),
lane_perf->min_length);
/* For RNDV only: max scaled fragment must be at least min_rndv_chunk */
if ((params->super.send_op == UCT_EP_OP_PUT_ZCOPY) ||
(params->super.send_op == UCT_EP_OP_GET_ZCOPY)) {
max_frag = ucs_max(max_frag, min_rndv_chunk);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the actual fix, right? we have also test for that in maybe mock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this line is the real fix, other changes are just the boundary checks
Yes, I have a test, but as written in description I cannot commit it until #10369 is merged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. seems weird that proto_muti.c has a specific case for rndv.
  2. we should not increase the size to be more than transport max frag, or the transport may not be able to send it. Maybe in the case here it works because UCX_RC_MAX_GET_ZCOPY is not "real" HW limitation but just a SW config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Agree, it looks a bit weird, but it does the right thing: we adjust max_frag only for RNDV, according to the minimal RNDV chunk size.

  2. No, it shouldn't increase max_frag above tl_max_frag, here is the reasoning:
    Initial value of max_frag is capped by tl_max_frag:

        max_frag = ucs_double_to_sizet(lane_perf->bandwidth / max_frag_ratio,
                                       lane_perf->max_frag);

lane_perf->min_length is guaranteed to be within [tl_min_frag, tl_max_frag]
=> min_rndv_chunk is guaranteed to be within [min_length, tl_max_frag]
=> max_frag = ucs_max(max_frag, min_rndv_chunk) can never exceed tl_max_frag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we move this code to rndv protocol or make it more generic? maybe using a flag in params?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid moving this code outside is gonna be very hard, because other calculations are tightly coupled with max_frag. Maybe in the future we can refactor this function, as it does a lot of things.

Using flag in params seems viable option to me, and btw there are already suitable flags, indicating that RNDV is used: UCP_PROTO_COMMON_INIT_FLAG_SEND_ZCOPY and UCP_PROTO_COMMON_INIT_FLAG_RECV_ZCOPY. And the end it will be the same as checking send_op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored code around min_chunk:

  • A separate function ucp_proto_multi_get_min_chunk returns min_rndv_chunk_size for RNDV protocols, 0 otherwise
  • ucp_proto_multi_init is pure generic wrt min_chunk

}

lpriv->max_frag = max_frag;
perf.max_frag += max_frag;

Expand Down Expand Up @@ -215,9 +227,7 @@ ucs_status_t ucp_proto_multi_init(const ucp_proto_multi_init_params_t *params,
perf.min_length = ucs_max(perf.min_length, min_length);

weight_sum += lpriv->weight;
min_end_offset += lane_perf->bandwidth *
context->config.ext.min_rndv_chunk_size /
min_bandwidth;
min_end_offset += min_rndv_chunk;
mpriv->min_frag = ucs_max(mpriv->min_frag, lane_perf->min_length);
mpriv->max_frag_sum += lpriv->max_frag;
lpriv->weight_sum = weight_sum;
Expand Down
Loading