-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu-margo-p2p-bw assertion on thetagpu #59
Comments
Hi Phil,
Yes it is. It can be fixed by configuring libfabric with cuda support.
See the info in perf-regression/theta-gpu/README.md in this PR on how to do this:
HDFGroup#4
Vailin
From: Phil Carns ***@***.***>
Date: Sunday, November 27, 2022 at 10:48 AM
To: mochi-hpc-experiments/mochi-tests ***@***.***>
Cc: Vailin Choi ***@***.***>, Mention ***@***.***>
Subject: [mochi-hpc-experiments/mochi-tests] gpu-margo-p2p-bw assertion on thetagpu (Issue #59)
I can compile and run the thetagpu regression test with the latest version of spack after the updates in #58<#58> and mochi-hpc-experiments/platform-configurations#18<mochi-hpc-experiments/platform-configurations#18>, but the test fails at runtime with the following:
gpu-margo-p2p-bw: ../perf-regression/gpu-margo-p2p-bw.cu:275: int main(int, char**): Assertion `ret == 0' failed.
(that indicates a failure of margo_bulk_create_attr()). I assume this is the libfabric problem where it looks like the libfabric library itself needs to be explicitly configured with gpu memory support?
@vchoi-hdfgroup<https://github.com/vchoi-hdfgroup> @jhendersonHDF<https://github.com/jhendersonHDF>
—
Reply to this email directly, view it on GitHub<#59>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANF3JBFABIAJNKACYSZODTLWKOGE3ANCNFSM6AAAAAASMTGNRQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Great, thanks! I'll leave this issue open for now just to track this topic; we can close it once the fix is in place here. I put a comment on that other PR; it would be great if you could contribute the new variant directly to the mochi-spack-packages repo. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I can compile and run the thetagpu regression test with the latest version of spack after the updates in #58 and mochi-hpc-experiments/platform-configurations#18, but the test fails at runtime with the following:
gpu-margo-p2p-bw: ../perf-regression/gpu-margo-p2p-bw.cu:275: int main(int, char**): Assertion `ret == 0' failed.
(that indicates a failure of
margo_bulk_create_attr()
). I assume this is the libfabric problem where it looks like the libfabric library itself needs to be explicitly configured with gpu memory support?@vchoi-hdfgroup @jhendersonHDF
The text was updated successfully, but these errors were encountered: