Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16365 client: intercept MPI_Init() to avoid nested call #14992

Merged
merged 5 commits into from
Aug 29, 2024

Conversation

wiliamhuang
Copy link
Contributor

@wiliamhuang wiliamhuang commented Aug 22, 2024

We observed deadlock in MPI applications on Aurora due to nested calls of zeInit() inside MPI_Init(). daos_init() is involved in such nested calls. This PR intercepts MPI_Init() and avoid running daos_init() inside MPI_Init().

Features: pil4dfs

Required-githooks: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Features: pil4dfs

Required-githooks: true

Signed-off-by: Lei Huang <[email protected]>
Copy link

Ticket title is 'deadlock in MPI application on Aurora with libpil4dfs'
Status is 'In Review'
Labels: 'intercept_lib,scrubbed_2.8'
https://daosio.atlassian.net/browse/DAOS-16365

Features: pil4dfs

Required-githooks: true

Signed-off-by: Lei Huang <[email protected]>
Features: pil4dfs

Required-githooks: true

Signed-off-by: Lei Huang <[email protected]>
Features: pil4dfs

Required-githooks: true

Signed-off-by: Lei Huang <[email protected]>
@wiliamhuang wiliamhuang marked this pull request as ready for review August 23, 2024 14:00
@wiliamhuang wiliamhuang requested review from a team as code owners August 23, 2024 14:00
@mchaarawi
Copy link
Contributor

i will test that today

@wiliamhuang
Copy link
Contributor Author

i will test that today

Thank you very much!

Copy link
Contributor

@knard38 knard38 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall PR looks good to me, just a concurrency management concerns which is not clear for me.

* libc functions. Avoid possible zeInit reentrancy/nested call.
*/

if (atomic_load_relaxed(&mpi_init_count) > 0) {
Copy link
Contributor

@knard38 knard38 Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to perfectly understand the _relaxed semantic, but for this test it should probably be better to use a strict atomic (from my understanding of the gcc documentation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.cppreference.com/w/cpp/atomic/memory_order
From my understanding, atomic operation is already guaranteed with "memory_order_relaxed".

Copy link
Contributor

@knard38 knard38 Aug 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed you are right, I had misunderstood the following sentence from the documentation https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync
__ATOMIC_RELAXED Implies no inter-thread ordering constraints.
At the end the following documentation is really more clear on the different memory model used for inter thread synchronization https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync

However, I still have a concern with this synchronization design pattern:

Thread A:
Execute line 1147 and the condition is false
Stopped by the scheduler

Thread B:
Execute line 1037
Start Execute line 1038 and is interrupted by the scheduler during the execution of next_mpi_init()

Thread A:
Execute line 1158 and following

From my understanding we still have the race issue.

Copy link
Contributor Author

@wiliamhuang wiliamhuang Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knard-intel Thank you very much for your comments! I will think more about this.
The issue we encountered are in MPI applications on Aurora. The hang was caused by dead lock in nested calls of zeInit() in Intel level zero drivers in the same thread.
Our goal was to avoid daos_init() being called inside MPI_Init(). All IO requests are forward to dfuse.
We do not know whether we will have issues if thread A is calling daos_init() and thread B starts calling MPI_Init().

Copy link
Contributor

@knard38 knard38 Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation which makes sense to me :-)

Features: pil4dfs

Required-githooks: true

Signed-off-by: Lei Huang <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14992/5/execution/node/1417/log

Copy link
Contributor

@knard38 knard38 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* libc functions. Avoid possible zeInit reentrancy/nested call.
*/

if (atomic_load_relaxed(&mpi_init_count) > 0) {
Copy link
Contributor

@knard38 knard38 Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation which makes sense to me :-)

@wiliamhuang
Copy link
Contributor Author

The only failure is a known issue, fio with libaio + fork() due to bug in mercury.
https://daosio.atlassian.net/browse/DAOS-16315

@mchaarawi Could you please review this PR? Thank you very much!

@wiliamhuang wiliamhuang requested a review from a team August 28, 2024 21:31
@wiliamhuang wiliamhuang added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Aug 28, 2024
@wiliamhuang
Copy link
Contributor Author

@daos-stack/daos-gatekeeper Can we land this PR? Thank you very much!

@daltonbohning daltonbohning merged commit a5552da into master Aug 29, 2024
54 of 56 checks passed
@daltonbohning daltonbohning deleted the lei/DAOS-16365_mpi_init branch August 29, 2024 15:55
@wiliamhuang
Copy link
Contributor Author

@mchaarawi We should port this to 2.6. Right?

@mchaarawi
Copy link
Contributor

@mchaarawi We should port this to 2.6. Right?

yes please

@wiliamhuang
Copy link
Contributor Author

@mchaarawi We should port this to 2.6. Right?

yes please

Thank you! I requested for the backport to release/2.6. I will create a PR once it is approved.

jolivier23 pushed a commit that referenced this pull request Aug 29, 2024
We observed deadlock in MPI applications on Aurora due to nested calls of zeInit() inside MPI_Init(). daos_init() is involved in such nested calls. This PR intercepts MPI_Init() and avoid running daos_init() inside MPI_Init().

Signed-off-by: Lei Huang <[email protected]>
wiliamhuang added a commit that referenced this pull request Aug 30, 2024
We observed deadlock in MPI applications on Aurora due to nested calls of zeInit() inside MPI_Init(). daos_init() is involved in such nested calls. This PR intercepts MPI_Init() and avoid running daos_init() inside MPI_Init().

Signed-off-by: Lei Huang <[email protected]>
mchaarawi pushed a commit that referenced this pull request Sep 3, 2024
…#15047)

We observed deadlock in MPI applications on Aurora due to nested calls of zeInit() inside MPI_Init(). daos_init() is involved in such nested calls. This PR intercepts MPI_Init() and avoid running daos_init() inside MPI_Init().

Signed-off-by: Lei Huang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed.
Development

Successfully merging this pull request may close these issues.

5 participants