You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When tuning is active during Multi-GPU runs each GPU independently tunes each Kernel. This results in different GPUs using different launch configurations for the final Kernel launch and finally makes binary reproducibility impossible. This was first discovered in #182.
While a simple global reduction over the elapsed time during the tuning can help in synchronous runs it will cause hangs when using asynchronous algorithms like DD where each GPU works on a local problem and may not even launch the tuning process for a specific Kernel.
This then also relates to the issue mentioned in tune.cpp
//FIXME: We should really check to see if any nodes have tuned a kernel that was not also tuned on node 0, since as things
// stand, the corresponding launch parameters would never get cached to disk in this situation. This will come up if we
// ever support different sub volumes per GPU (as might be convenient for lattice volumes that don't divide evenly).
We need a non blocking solution to that.
The text was updated successfully, but these errors were encountered:
When tuning is active during Multi-GPU runs each GPU independently tunes each Kernel. This results in different GPUs using different launch configurations for the final Kernel launch and finally makes binary reproducibility impossible. This was first discovered in #182.
While a simple global reduction over the elapsed time during the tuning can help in synchronous runs it will cause hangs when using asynchronous algorithms like DD where each GPU works on a local problem and may not even launch the tuning process for a specific Kernel.
This then also relates to the issue mentioned in tune.cpp
We need a non blocking solution to that.
The text was updated successfully, but these errors were encountered: