-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ch4/ofi: Add psm3 capability set #5864
Conversation
We disabled libfabric psm3 -- mpich/src/mpid/ch4/netmod/ofi/subconfigure.m4 Line 276 in ebe7e0a
|
@hzhou, shall I enable psm3 and test this on jenkins to see what compile issues you're getting? |
Sure. Go ahead |
@nitbhat When you done, use the custom test for it. For example -- #5904 (comment) The default review tests will use pre-built modules, which skips the rebuild of libfabric. |
cec1284
to
faab8e6
Compare
test:mpich/custom |
faab8e6
to
99baefb
Compare
test:mpich/custom |
Looks like it is working now. For reference, the original reason we disabled |
@hzhou: The build looks good and the tests seem to be working. But, is there a way I can specify to the test framework on jenkins to use the psm3 provider? (maybe with netmod:ch4:ofi:psm3?) I see that sockets is used as the provider for the tests. |
Our Jenkins only have Infiniband card. Does |
Yes, psm3 works with Infiniband, it is a performant alternative to sockets over IB. |
test:mpich/custom |
test:mpich/ch4/ofi |
Currently, the PSM3 provider also needs another environment variable to work correctly, PSM3_MULTI_EP=1. I'll relaunch the tests with that if the tests failing, which I'm guessing will. |
test:mpich/custom |
There is only one failure seen for psm3. I have seen this issue before and it happens intermittently for some tests. I've opened #5975 and will add an xfail to this PR. |
7fdb3f0
to
2e042ac
Compare
test/mpi/maint/jenkins/xfail.conf
Outdated
@@ -86,6 +86,9 @@ | |||
# Sunf90 forbid passing cray pointer as integer | |||
* solstudio * * * /^allocmemf90/ xfail=ticket0 f90/ext/testlist | |||
|
|||
# psm3 specific failures | |||
* * * ch4:ofi:psm3 * /^p_red .*/ xfail=issue5975 coll/testlist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hzhou: Is the right way to xfail a psm3 specific failure?
The xfail entry won't work against the custom tests since |
test:mpich/custom |
Okay, understood. I'll go ahead and remove that commit in that case. |
2e042ac
to
4b6f1ef
Compare
test:mpich/custom |
Since |
@nitbhat Does |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@nitbhat Approved. Could you rebase? |
4b6f1ef
to
fa162c4
Compare
Yes, rebased. |
test:mpich/warnings |
Pull Request Description
This PR adds a capability set for the PSM3 provider. All of the current configuration is borrowed from the PSM2 provider.
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.