-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve robustness of GPUDirect and fix silent errors #238
Conversation
… added debugging of communicator declaration in comm_common.cpp.
…CPU comms buffer checking using std::fill and std::copy.
…handlers for the requested number of faces.
Changes look good to me but I did not yet compile (my local build machine is down again). The comments mention possible future improvements / optimizations. I have not yet checked but it might be good to have issues for them. Automated testing would really be handy now ;-) |
The use of the macros here follows that used in malloc_quda.h, and it is used to get access to the file, function, line info string for debugging. I agree that in general macros should be avoided, but used sparingly they can be useful. |
I did get the intention and consider it to be a reasonable choice here. |
On making new issues for the improvements in the comments: I believe these have already been implemented into Justin's peer-2-peer branch, but I have yet to verify that. Once the immediate GPU_COMMS bugs have been verified fixed (issue #231) and this is merged in, one of my next tasks is to clean the peer-2-peer branch and get it merged in too (as a new feature). |
I don't think we can run any specific tests but as GPU_COMMS is anyway still experimental I will merge that in. The code looks good to me. |
Improve robustness of GPUDirect and fix silent errors
This significantly improved the error checking and robustness when using GPU_COMMS to avoid silent errors and removes a nasty hack when supporting staggered fermions. The additional error checking also applies to the non-GPU_COMMS path and in either case is only performed when HOST_DEBUG=yes.
With respect to the last point: that fact that message handlers for all numbers of faces were declared irrespective of the operator was a source of silent failures. E.g., when doing Wilson fermions, the halo region / ghost zone is of depth one, but message handlers were previously created for depths (face) 1 through 3. However, the size of the receive buffers allocated was only sized for depth 1. This is a silent error if it goes unchecked, it didn’t causes any problems when running on GPU-aware MPI (e.g., creating a MPI handle using invalid memory), but I guess this is totally undefined behaviour, and could causes nasty things to happen on GDR.