Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/shm: fi_setopt causes a segmentation fault when setting FI_OPT_MIN_MULTI_RECV #10591

Closed
AOA-Mohammed opened this issue Nov 29, 2024 · 2 comments · Fixed by #10618
Closed

Comments

@AOA-Mohammed
Copy link

Describe the bug
fi_setopt causes a segmentation fault when setting FI_OPT_MIN_MULTI_RECV before enabling the endpoint.
According to fi_endpoint man page, it is recommended to set FI_OPT_MIN_MULTI_RECV before enabling the endpoint.
Doing so, causes a segmentation fault with the following backtrace:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==2613126==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x7f128262fc1d bp 0x7f11fab08410 sp 0x7f11fab082d0 T2)
==2613126==The signal is caused by a READ memory access.
==2613126==Hint: address points to the zero page.
    #0 0x7f128262fc1d in smr_ep_setopt (/lib/libfabric.so.1+0xabc1d) (BuildId: 3f0a904075fdd37c2719c171f33bd661bbe68aeb)
    #1 0x7f128280d103 in fi_setopt /include/rdma/fi_endpoint.h:232:9
    #2 0x7f128280d103 in mstro_ep_build_from_ofi maestro-core/maestro/ofi.c:1198:14

However, setting FI_OPT_MIN_MULTI_RECV after enabling the endpoint works for the shm provider (unlike other providers and opposite to what is recommended in the man pages).

To Reproduce
set FI_OPT_MIN_MULTI_RECV before enabling the endpoint.

Expected behavior
The code should work.

Output
The application fails with a seg fault as above

Environment:
OS (if not Linux), provider, endpoint type, etc.
libfabric 1.21.0
shm provider
icx compiler 2023.2.0

@aingerson
Copy link
Contributor

@AOA-Mohammed Sorry for the late response. I've opened a PR that should fix this issue. If you have a chance, could you test with PR #10618 and verify that fixes the issue? Thanks for reporting!

@AOA-Mohammed
Copy link
Author

@aingerson I tested the fix and fi_setopt completes safely. Thanks for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants