Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda-memcheck errors #222

Closed
alexstrel opened this issue Apr 14, 2015 · 7 comments
Closed

cuda-memcheck errors #222

alexstrel opened this issue Apr 14, 2015 · 7 comments
Labels
Milestone

Comments

@alexstrel
Copy link
Member

cuda-memcheck utility returns CUDA_ERROR_INVALID_VALUE while the application executed successfully, probably a QMP-related issue. This is an example of the single-GPU execution with cuda-memcheck (code was built with QMP):
========= Program hit CUDA_ERROR_INVALID_VALUE (error 1) due to "invalid argument" on CUDA API call to cuPointerGetAttribute.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib64/libcuda.so.1 (cuPointerGetAttribute + 0x174) [0x13d374]
========= Host Frame:./tests/invert_test_orig [0xb2b9d0]
========= Host Frame:./tests/invert_test_orig [0xca660d]
========= Host Frame:./tests/invert_test_orig [0xca6051]
========= Host Frame:./tests/invert_test_orig (mca_coll_self_allreduce_intra + 0x6f) [0xb601bf]
========= Host Frame:./tests/invert_test_orig [0xac009c]
========= Host Frame:./tests/invert_test_orig [0xaa54ab]
========= Host Frame:./tests/invert_test_orig [0x30e1be]
========= Host Frame:./tests/invert_test_orig [0x2ca6a2]
========= Host Frame:./tests/invert_test_orig [0x2d282e]
========= Host Frame:./tests/invert_test_orig [0x1e6123]
========= Host Frame:./tests/invert_test_orig [0x78062e]
========= Host Frame:./tests/invert_test_orig [0x7a401]
========= Host Frame:./tests/invert_test_orig [0x32a3f]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xfd) [0x1ed1d]
========= Host Frame:./tests/invert_test_orig [0x315f1]

@alexstrel alexstrel added the bug label Apr 14, 2015
@alexstrel alexstrel added this to the QUDA 0.7.1 milestone Apr 14, 2015
@mathiaswagner
Copy link
Member

Some questions:

  • Does this happen with quda 0.7 ? I guess invert_test_orig is the invert_test included in the 0.7 release?
  • Did you try a build without QMP?
  • ‘Executed successfully’ means no error message when run without cuda-memcheck ?

@alexstrel
Copy link
Member Author

Yes, I used current muster (quda-0.7 release), and the invert_test application.
I'll check other options, i.e., pure MPI and pure single-GPU builds.
Yes , no errors without cuda-memcheck.

@mathiaswagner
Copy link
Member

A single GPU build completes
cuda-memcheck ./invert_test
without errors for me (using CUDA 7.0).

I have not yet tried MPI or QMP.
It might also help to enable HOST_DEBUG for compilation and tracking down the location of the error.

@nmrcardoso
Copy link
Contributor

nmrcardoso commented Apr 22, 2015 via email

@nmrcardoso
Copy link
Contributor

I made a simple test only using cuPointerGetAttribute and passing a device and host pointers, no MPI here. if cuPointerGetAttribute is called and if if the pointer is not a device pointer
then cuda-memcheck always returns errors.
This is very annoying if we want to run cuda-memcheck and somewhere in the code there is a call to this function.

@mathiaswagner
Copy link
Member

I have not checked in detail but does

http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__UNIFIED.html#group__CUDA__UNIFIED_1g0c28ed0aff848042bc0533110e45820c

maybe explains the issue?

On Apr 22, 2015, at 14:20, nmrcardoso <[email protected]mailto:[email protected]> wrote:

I made a simple test only using cuPointerGetAttribute and passing a device and host pointers, no MPI here. if cuPointerGetAttribute is called and if if the pointer is not a device pointer
then cuda-memcheck always returns errors.
This is very annoying if we want to run cuda-memcheck and somewhere in the code there is a call to this function.


Reply to this email directly or view it on GitHubhttps://github.com//issues/222#issuecomment-95290359.


Mathias Wagner
Department of Physics SW 117 - Indiana University
Bloomington, IN 47405
email: [email protected]:[email protected]

@nmrcardoso
Copy link
Contributor

cuda-memcheck errors from cuPointerGetAttribute are benign errors.
cuPointerGetAttribute is used to test whether the pointer is part of a cuda 'unified memory' or cuda managed memory object, however if the pointer passed is a "non cuda pointer" then cuda-memcheck triggers this as an error.
I think that there is no way to tell cuda-memcheck to ignore this kind of errors, just ignore it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants