-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16365 client: intercept MPI_Init() to avoid nested call #14992
Conversation
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Ticket title is 'deadlock in MPI application on Aurora with libpil4dfs' |
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
i will test that today |
Thank you very much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall PR looks good to me, just a concurrency management concerns which is not clear for me.
* libc functions. Avoid possible zeInit reentrancy/nested call. | ||
*/ | ||
|
||
if (atomic_load_relaxed(&mpi_init_count) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure to perfectly understand the _relaxed semantic, but for this test it should probably be better to use a strict atomic (from my understanding of the gcc documentation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.cppreference.com/w/cpp/atomic/memory_order
From my understanding, atomic operation is already guaranteed with "memory_order_relaxed".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed you are right, I had misunderstood the following sentence from the documentation https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync
__ATOMIC_RELAXED Implies no inter-thread ordering constraints.
At the end the following documentation is really more clear on the different memory model used for inter thread synchronization https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync
However, I still have a concern with this synchronization design pattern:
Thread A:
Execute line 1147 and the condition is false
Stopped by the scheduler
Thread B:
Execute line 1037
Start Execute line 1038 and is interrupted by the scheduler during the execution of next_mpi_init()
Thread A:
Execute line 1158 and following
From my understanding we still have the race issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@knard-intel Thank you very much for your comments! I will think more about this.
The issue we encountered are in MPI applications on Aurora. The hang was caused by dead lock in nested calls of zeInit() in Intel level zero drivers in the same thread.
Our goal was to avoid daos_init() being called inside MPI_Init(). All IO requests are forward to dfuse.
We do not know whether we will have issues if thread A is calling daos_init() and thread B starts calling MPI_Init().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation which makes sense to me :-)
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14992/5/execution/node/1417/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* libc functions. Avoid possible zeInit reentrancy/nested call. | ||
*/ | ||
|
||
if (atomic_load_relaxed(&mpi_init_count) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation which makes sense to me :-)
The only failure is a known issue, fio with libaio + fork() due to bug in mercury. @mchaarawi Could you please review this PR? Thank you very much! |
@daos-stack/daos-gatekeeper Can we land this PR? Thank you very much! |
@mchaarawi We should port this to 2.6. Right? |
yes please |
Thank you! I requested for the backport to release/2.6. I will create a PR once it is approved. |
We observed deadlock in MPI applications on Aurora due to nested calls of zeInit() inside MPI_Init(). daos_init() is involved in such nested calls. This PR intercepts MPI_Init() and avoid running daos_init() inside MPI_Init(). Signed-off-by: Lei Huang <[email protected]>
We observed deadlock in MPI applications on Aurora due to nested calls of zeInit() inside MPI_Init(). daos_init() is involved in such nested calls. This PR intercepts MPI_Init() and avoid running daos_init() inside MPI_Init(). Signed-off-by: Lei Huang <[email protected]>
…#15047) We observed deadlock in MPI applications on Aurora due to nested calls of zeInit() inside MPI_Init(). daos_init() is involved in such nested calls. This PR intercepts MPI_Init() and avoid running daos_init() inside MPI_Init(). Signed-off-by: Lei Huang <[email protected]>
We observed deadlock in MPI applications on Aurora due to nested calls of zeInit() inside MPI_Init(). daos_init() is involved in such nested calls. This PR intercepts MPI_Init() and avoid running daos_init() inside MPI_Init().
Features: pil4dfs
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: