-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda-memcheck errors #222
Comments
Some questions:
|
Yes, I used current muster (quda-0.7 release), and the invert_test application. |
A single GPU build completes I have not yet tried MPI or QMP. |
Did you run your program with mpi? QMP was build with mpi? The CUDA-aware
MPI env flag is active?
I also got that kind of errors but only using mpi and if I set
MV2_USE_CUDA in MVAPICH2.
Also if I run a non CUDA-aware MPI program, with MV2_USE_CUDA active this
gives a lot of that errors
in cuda-memcheck, and off course with MV2_USE_CUDA=0, there is no cuda
error.
|
I made a simple test only using cuPointerGetAttribute and passing a device and host pointers, no MPI here. if cuPointerGetAttribute is called and if if the pointer is not a device pointer |
I have not checked in detail but does maybe explains the issue? On Apr 22, 2015, at 14:20, nmrcardoso <[email protected]mailto:[email protected]> wrote: I made a simple test only using cuPointerGetAttribute and passing a device and host pointers, no MPI here. if cuPointerGetAttribute is called and if if the pointer is not a device pointer — Mathias Wagner |
cuda-memcheck errors from cuPointerGetAttribute are benign errors. |
cuda-memcheck utility returns CUDA_ERROR_INVALID_VALUE while the application executed successfully, probably a QMP-related issue. This is an example of the single-GPU execution with cuda-memcheck (code was built with QMP):
========= Program hit CUDA_ERROR_INVALID_VALUE (error 1) due to "invalid argument" on CUDA API call to cuPointerGetAttribute.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib64/libcuda.so.1 (cuPointerGetAttribute + 0x174) [0x13d374]
========= Host Frame:./tests/invert_test_orig [0xb2b9d0]
========= Host Frame:./tests/invert_test_orig [0xca660d]
========= Host Frame:./tests/invert_test_orig [0xca6051]
========= Host Frame:./tests/invert_test_orig (mca_coll_self_allreduce_intra + 0x6f) [0xb601bf]
========= Host Frame:./tests/invert_test_orig [0xac009c]
========= Host Frame:./tests/invert_test_orig [0xaa54ab]
========= Host Frame:./tests/invert_test_orig [0x30e1be]
========= Host Frame:./tests/invert_test_orig [0x2ca6a2]
========= Host Frame:./tests/invert_test_orig [0x2d282e]
========= Host Frame:./tests/invert_test_orig [0x1e6123]
========= Host Frame:./tests/invert_test_orig [0x78062e]
========= Host Frame:./tests/invert_test_orig [0x7a401]
========= Host Frame:./tests/invert_test_orig [0x32a3f]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xfd) [0x1ed1d]
========= Host Frame:./tests/invert_test_orig [0x315f1]
The text was updated successfully, but these errors were encountered: