-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/WIREUP: Support EP reconfiguration for non-reused p2p scenarios #9949
Conversation
986a5c8
to
3ccadc6
Compare
@shasson5 please see |
/azp run |
Azure Pipelines successfully started running 5 pipeline(s). |
src/ucp/wireup/wireup.c
Outdated
ucs_assert(reuse_lane_map[ucp_ep_get_cm_lane(ep)] != UCP_NULL_LANE); | ||
/* wireup lane hasn't been selected by the new configuration: only this | ||
* function should select it */ | ||
ucs_assert(new_key->wireup_msg_lane == UCP_NULL_LANE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ucs_assert(new_key->wireup_msg_lane == UCP_NULL_LANE); | |
ucs_assertv(new_key->wireup_msg_lane == UCP_NULL_LANE, "new_key->wireup_msg_lane=%u", | |
new_key->wireup_msg_lane); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's code from master.
Was marked as "new added" code for some wrong reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is indeed from master. Marked as diff because of indentation (if
block).
IMO the suggestions makes sense.
src/ucp/wireup/wireup.c
Outdated
|
||
/* Select CM/non-reused lane as new wireup lane */ | ||
new_lane = ucp_wireup_find_non_reused_lane(ep, key); | ||
new_uct_eps[new_lane] = &new_ep->super.super; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it be later overwritten in ucp_wireup_check_config_intersect
? Is it possible to initialize new_uct_eps
in one place? (would be less error-prone)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may complicate the flow, because you need to return [new_lane,new_ep] from this function and then add extra "if" block in ucp_wireup_check_config_intersect
in order to properly initialize new_uct_eps
.
And anyway the value of new_uct_eps
will not be overwritten, because in ucp_wireup_check_config_intersect
we only handle "reused" lanes.
/azp run |
Azure Pipelines successfully started running 5 pipeline(s). |
@yosefe please look, as discussed |
test/gtest/Makefile.am
Outdated
@@ -151,6 +151,7 @@ gtest_SOURCES = \ | |||
ucp/test_ucp_mem_type.cc \ | |||
ucp/test_ucp_perf.cc \ | |||
ucp/test_ucp_proto.cc \ | |||
ucp/test_ucp_ep_reconfigure.cc \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename to test_ucp_ep_reconfig.cc?
src/ucp/wireup/wireup.c
Outdated
@@ -1501,7 +1677,8 @@ ucs_status_t ucp_wireup_init_lanes(ucp_ep_h ep, unsigned ep_init_flags, | |||
* without CM to prevent reconfiguration error. | |||
*/ | |||
if ((ep->cfg_index != UCP_WORKER_CFG_INDEX_NULL) && | |||
!ucp_ep_has_cm_lane(ep)) { | |||
!ucp_wireup_can_reconfigure(ep, ep_init_flags, &tl_bitmap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better avoid calling init_lanes another time just for checking if we need to limit tl_bitmap,
can we do this hack (restrict tl_bitmap) only if we are going to print a fatal error that reconfig not supported?
src/ucp/wireup/wireup.c
Outdated
msg_ep = ucp_wireup_ep_extract_msg_ep(old_ep, &pending_queue); | ||
ucp_wireup_ep_set_aux(new_ep, msg_ep, aux_rsc_index, is_p2p); | ||
|
||
/* Add wireup messages to pending queue */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ucp_wireup_ep_extract_msg_ep is currently NOT returning wireup message requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
src/ucp/wireup/wireup.c
Outdated
|
||
/* Get old wireup lane */ | ||
old_lane = ucp_wireup_get_msg_lane(ep, UCP_WIREUP_MSG_REQUEST); | ||
old_ep = ucp_wireup_ep(ucp_ep_get_lane(ep, old_lane)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old_wireup_ep?
src/ucp/wireup/wireup.c
Outdated
ucp_ep_set_lane(ep, old_lane, NULL); | ||
|
||
/* Select CM/non-reused lane as new wireup lane */ | ||
new_lane = ucp_wireup_find_non_reused_lane(ep, key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_wireup_lane?
if (uct_ep != NULL) { | ||
uct_ep_destroy(uct_ep); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove spaceline
src/ucp/wireup/wireup_ep.c
Outdated
uct_ep_h msg_ep = ucp_wireup_ep_get_msg_ep(wireup_ep); | ||
|
||
uct_ep_pending_purge(&wireup_ep->super.super, | ||
ucp_wireup_ep_append_pending_cb, pending_queue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems ucp_wireup_ep_pending_purge (called from here) already calls ucp_wireup_ep_pending_req_release which decrements pending_count by 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced
src/ucp/wireup/wireup.c
Outdated
/* Get all reachable MDs from full remote address list and join with | ||
* current ep configuration | ||
*/ | ||
key->dst_md_cmpts = dst_md_storage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: move it after line 1609
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean move it to be before 1609?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either before or after key->dst_version = remote_address->dst_version;
src/ucp/wireup/wireup_ep.c
Outdated
ucp_wireup_ep_pending_purge(&wireup_ep->super.super, | ||
ucs_empty_function_do_assert_void, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we can just assert that wireup_ep->pending_q length is 0. and call uct_ep_pending_purge. then no need to move the functions ucp_wireup_ep_pending_purge, ucp_wireup_ep_pending_req_release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok but notice it requires duplication of ucs_assert(wireup_ep->pending_count == 0);
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to check pending_count==0? ucp_wireup_ep_pending_purge would release them anyway so it doesn't have to be empty IMO
/* Verify local and remote lanes are identical */ | ||
EXPECT_TRUE(has_matching_lane(ep(), lane, other)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this? isn't it enough to test that communication send/recv is working?
same for checking num_lanes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is it possible to check communication on all lanes without using multiple protocols/message sizes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we can do basic test like active message or tag match on several message sizes and extend it
since communication ability is something we need to test anyway to ensure configuration is working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better as a complement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO in the general case - when not all transports are p2p - we do not need to require that number of lanes is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so removed the num_lanes check
/* Verify local and remote lanes are identical */ | ||
EXPECT_TRUE(has_matching_lane(ep(), lane, other)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO in the general case - when not all transports are p2p - we do not need to require that number of lanes is the same.
@yosefe please approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls squash
What
Support partial EP reconfiguration: only non-reused, fully wired-up scenarios. (Lanes must not be reused and must all be p2p).
Why ?
PR 2 for EP reconfiguration.
How ?
Add a new function for replacing wireup lane by moving internal AUX EP to the new lane.
The following steps are performed:
a) for CM flow -> use CM EP.
b) otherwise -> create a new wireup_ep.
a) for CM flow -> use CM lane.
b) otherwise -> select a random lane. (Will be modified in next PRs to look for a non-reused lane).