-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting the rx capabilites and tx capabilites to 0 #1080
Setting the rx capabilites and tx capabilites to 0 #1080
Conversation
Signed-off-by: tmh97 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tmh97!
I noticed a problem on a certain provider we need to support and think your alternative suggestion to request the relevant subset of caps on the endpoint might be better, at least for now until we can assure providers of interest handle the 0 initialization as expected (the docs say the caps subset is inherited for fi_getinfo
... but perhaps not explicitly for fi_endpoint
). Anyway, let me know what you think.
BTW, I think issue #148 is related to this PR... I'm a bit nervous to play with it now, but I think we should try to fill the tx_attrs in |
Sounds like a good idea :) will keep this in mind for the future. |
@davidozog Hey Dave, just finished testing (ran every test bucket twice with 3 different combinations of ppn/rank). Everything looks good to go on Cornelis's end. |
Thanks @tmh97 - For completeness, we just need to check another provider or two, but I'm more confident in the (non-zero) flags now. Please feel free to make those changes on this PR and we'll be ready to merge shortly. |
Instead of zeroing caps, we will set them to a relevant subset of the p_info->caps. Co-authored-by: David Ozog <[email protected]>
Instead of zeroing caps, we will set them to a relevant subset of the p_info->caps. Co-authored-by: David Ozog <[email protected]>
Changes made :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you ready to merge this today @tmh97?
Definitely! Let's do it. |
FYI i'm not authorized so will need your help to merge :) |
Signed-off-by: Thomas Huber [email protected]
In
transport_ofi.c
there are two instances of behavior that might be a small violation of the libfabric man pages. Both instances are related to setting the capabilites (caps) for tx_attr->caps and rx_attr->capsshmem_trasnport_ofi_target_ep_init(void)
info->p_info->caps = FI_RMA | FI_ATOMIC | FI_REMOTE_READ | FI_REMOTE_WRITE
which overwrites whatever what returned in call toget_info()
info->p_info->tx_attr->caps
are still set to whatever was returned byget_info()
fi_endpoint()
,fi_cq_open_fi_ep_bind()
, andfi_enable()
are madefi_endpoint()
https://ofiwg.github.io/libfabric/v1.12.0/man/fi_endpoint.3.html under the section Transmit Context Attributes is the following: The capabilities must be a subset of those requested of the associated endpointFI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC, FI_READ, FI_WRITE, FI_SEND, FI_HMEM, FI_TRIGGER, FI_FENCE, FI_MULTICAST, FI_RMA_PMEM, FI_NAMED_RX_CTX, and FI_COLLECTIVE.
"tx_attr->caps
to 0, resulting intx_attr->caps
being a subset of the general capsFI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC, FI_READ, FI_WRITE, FI_SEND, FI_HMEM, FI_TRIGGER, FI_FENCE, FI_MULTICAST, FI_RMA_PMEM, FI_NAMED_RX_CTX, and FI_COLLECTIVE
) with the capabilities that SOS sets in this function (FI_RMA | FI_ATOMIC | FI_REMOTE_READ | FI_REMOTE_WRITE
) which would result intx_attr->caps = FI_RMA | FI_ATOMIC
shmem_transport_ofi_ctx_init(shmem_transport_ctx_t *ctx, int id)
the (almost) exact same situation existsinfo->p_info->caps = FI_RMA | FI_WRITE | FI_READ | FI_ATOMIC;
which is slightly different than the behavior of the above functiontx_attr->caps
to 0, resulting intx_attr->caps
being a subset of the general capsFI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC, FI_READ, FI_WRITE, FI_SEND, FI_HMEM, FI_TRIGGER, FI_FENCE, FI_MULTICAST, FI_RMA_PMEM, FI_NAMED_RX_CTX, and FI_COLLECTIVE
) with the capabilities that SOS sets in this function (FI_RMA | FI_WRITE | FI_READ | FI_ATOMIC
) which would result intx_attr->caps = FI_RMA | FI_ATOMIC | FI_READ | FI_WRITE
slightly different from behavior of the above function