-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/RNDV: Adjust max_frag to be at least of minimal RNDV chunk size #10407
base: master
Are you sure you want to change the base?
UCP/RNDV: Adjust max_frag to be at least of minimal RNDV chunk size #10407
Conversation
Can we reproduce it in gtest? |
Yes, as I wrote in description I reproduced it with mock test |
src/ucp/proto/proto_common.c
Outdated
@@ -382,7 +382,9 @@ ucp_proto_common_get_lane_perf(const ucp_proto_common_init_params_t *params, | |||
&perf_attr.latency) + | |||
params->latency; | |||
tl_perf->sys_latency = 0; | |||
tl_perf->min_length = ucs_max(params->min_length, tl_min_frag); | |||
/* min_length must be within [tl_min_frag, tl_max_frag] range */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess comment can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/ucp/proto/proto_common.c
Outdated
@@ -382,7 +382,9 @@ ucp_proto_common_get_lane_perf(const ucp_proto_common_init_params_t *params, | |||
&perf_attr.latency) + | |||
params->latency; | |||
tl_perf->sys_latency = 0; | |||
tl_perf->min_length = ucs_max(params->min_length, tl_min_frag); | |||
/* min_length must be within [tl_min_frag, tl_max_frag] range */ | |||
tl_perf->min_length = ucs_max(ucs_min(params->min_length, tl_max_frag), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this specific range forcing covered by some parameter combination/gtest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not covered.. This is rather a common sense to keep the min_length within the HW limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that if params->min_length > tl_max_frag, the protocol should be disabled: it requires a fragment length that is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done
src/ucp/proto/proto_multi.c
Outdated
min_rndv_chunk = lane_perf->bandwidth * | ||
context->config.ext.min_rndv_chunk_size / | ||
min_bandwidth; | ||
/* Minimal RNDV chunk must be within [min_length, tl_max_frag] range */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something more like: "we still to operate within iface/hw limits" or something, else it repeats the code a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I better remove that comment
src/ucp/proto/proto_multi.c
Outdated
/* For RNDV only: max scaled fragment must be at least min_rndv_chunk */ | ||
if ((params->super.send_op == UCT_EP_OP_PUT_ZCOPY) || | ||
(params->super.send_op == UCT_EP_OP_GET_ZCOPY)) { | ||
max_frag = ucs_max(max_frag, min_rndv_chunk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the actual fix, right? we have also test for that in maybe mock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this line is the real fix, other changes are just the boundary checks
Yes, I have a test, but as written in description I cannot commit it until #10369 is merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- seems weird that proto_muti.c has a specific case for rndv.
- we should not increase the size to be more than transport max frag, or the transport may not be able to send it. Maybe in the case here it works because UCX_RC_MAX_GET_ZCOPY is not "real" HW limitation but just a SW config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Agree, it looks a bit weird, but it does the right thing: we adjust max_frag only for RNDV, according to the minimal RNDV chunk size.
-
No, it shouldn't increase max_frag above tl_max_frag, here is the reasoning:
Initial value ofmax_frag
is capped bytl_max_frag
:
max_frag = ucs_double_to_sizet(lane_perf->bandwidth / max_frag_ratio,
lane_perf->max_frag);
lane_perf->min_length
is guaranteed to be within [tl_min_frag, tl_max_frag]
=> min_rndv_chunk
is guaranteed to be within [min_length, tl_max_frag]
=> max_frag = ucs_max(max_frag, min_rndv_chunk)
can never exceed tl_max_frag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we move this code to rndv protocol or make it more generic? maybe using a flag in params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid moving this code outside is gonna be very hard, because other calculations are tightly coupled with max_frag. Maybe in the future we can refactor this function, as it does a lot of things.
Using flag in params seems viable option to me, and btw there are already suitable flags, indicating that RNDV is used: UCP_PROTO_COMMON_INIT_FLAG_SEND_ZCOPY
and UCP_PROTO_COMMON_INIT_FLAG_RECV_ZCOPY
. And the end it will be the same as checking send_op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have refactored code around min_chunk:
- A separate function
ucp_proto_multi_get_min_chunk
returnsmin_rndv_chunk_size
for RNDV protocols, 0 otherwise ucp_proto_multi_init
is pure generic wrt min_chunk
src/ucp/proto/proto_common.c
Outdated
ucs_debug("protocol %s min_length %zu is larger than lane[%d] max_frag " | ||
"%zu", ucp_proto_id_field(params->super.proto_id, name), | ||
params->min_length, lane, tl_max_frag); | ||
return UCS_ERR_OUT_OF_RANGE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UCS_ERR_INVALID_PARAM - params->min_length is invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/ucp/proto/proto_multi.c
Outdated
/* For RNDV only: max scaled fragment must be at least min_rndv_chunk */ | ||
if ((params->super.send_op == UCT_EP_OP_PUT_ZCOPY) || | ||
(params->super.send_op == UCT_EP_OP_GET_ZCOPY)) { | ||
max_frag = ucs_max(max_frag, min_rndv_chunk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we move this code to rndv protocol or make it more generic? maybe using a flag in params?
src/ucp/proto/proto_common.c
Outdated
@@ -355,6 +355,10 @@ ucp_proto_common_get_lane_perf(const ucp_proto_common_init_params_t *params, | |||
|
|||
ucp_proto_common_get_frag_size(params, &wiface->attr, lane, &tl_min_frag, | |||
&tl_max_frag); | |||
if (params->min_length > tl_max_frag) { | |||
ucs_debug("params->min_length=%zu is invalid", params->min_length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls print more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I misinterpreted your last comment #10407 (comment)
I thought you were proposing to reduce error message
src/ucp/proto/proto_multi.c
Outdated
{ | ||
ucp_context_h context = params->super.super.worker->context; | ||
|
||
if (params->super.flags & (UCP_PROTO_COMMON_INIT_FLAG_SEND_ZCOPY | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we take it from params->min_length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this comment well
-
If you are asking whether we can
return params->min_length
instead of default for non-rendezvousreturn 0
, then yes, we can do it, it should not change anything -
If you are asking whether we can reuse
params->min_length
field in order to storemin_rndv_chunk
for RNDV protocols - I'm not sure we can do that.
The meaning ofparams->min_length
is the minimal message length, and there is a bunch or related calculations for that.
The meaning ofmin_rndv_chunk
is really different: minimum allowed chunk size, meaning we don't want to split the message in smaller chunks, but we do allow smaller messages. So if we storemin_rndv_chunk
size inparams->min_length
, then we mess up themin_length
meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add param to ucp_proto_multi_init_params_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I analysed min_length
a bit more, and I'm sure we cannot reuse it for the purpose of min_rndv_chunk.
The reason is that field has already an established meaning (minimal message length supported by a protocol), and corresponding implementation: ucp_proto_init_perf
uses this field to define the range of payload:
range_start = ucs_max(params->min_length, tl_perf->min_length);
I'm sure we don't want to change this logic.
Regarding the idea of having single-lane protocol, implementation wise it might be non-trivial. Currently there is only one probe per protocol (e.g. "rndv/get/zcopy") that supports multiple lanes. So if we want to have single-lane version of that protocol, it would require to introduce just another protocol ("rndv/single/get/zcopy"?). IMO looks like an overkill for this task.
I think the best approach would be to extend ucp_proto_multi_init_params_t
with one extra field e.g. min_chunk_size
.
@@ -26,6 +26,7 @@ static void ucp_rndv_am_probe_common(ucp_proto_multi_init_params_t *params) | |||
params->super.exclude_map = 0; | |||
params->super.min_length = 0; | |||
params->super.max_length = SIZE_MAX; | |||
params->super.min_chunk = context->config.ext.min_rndv_chunk_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can set it to 0
ucp_request_param_t params; | ||
params.op_attr_mask = UCP_OP_ATTR_FIELD_CALLBACK | | ||
UCP_OP_ATTR_FIELD_USER_DATA; | ||
params.user_data = arg; | ||
params.cb.recv_am = ctx->cmpl; | ||
|
||
auto sptr = ucp_am_recv_data_nbx(ctx->worker, data, ctx->buf->ptr(), | ||
len, ¶ms); | ||
EXPECT_FALSE(UCS_PTR_IS_ERR(sptr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to make sure it is RNDV. Otherwise, if it is inline message, calling ucp_am_recv_data_nbx
is invalid
What?
This PR addresses an assertion failure described in https://jirasw.nvidia.com/browse/UCX-1054
Assertion happens when running io-demo with cuda support, currently it's reproducible only on rock05.
Steps to reproduce:
This results in an assertion failure on client side (if read operation is requested) or server side (with write operation):
Why?
The issue boils down to the corner case when
lpriv->max_frag
is smaller thanmin_rndv_chunk
. This happens due to low max_zcopy limit that we set withUCX_RC_MAX_GET_ZCOPY=32k
config.How?
Adjust
max_frag
to be at least ofmin_rndv_chunk
size.Tested with io-demo
Reproduced issue with
test_ucp_proto_mock_rcx
, but cannot append this unit test here because it requires #10369