Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Communicate tunecache during runs when tuning is active in Multi-GPU runs #199

Open
mathiaswagner opened this issue Dec 12, 2014 · 1 comment
Labels

Comments

@mathiaswagner
Copy link
Member

When tuning is active during Multi-GPU runs each GPU independently tunes each Kernel. This results in different GPUs using different launch configurations for the final Kernel launch and finally makes binary reproducibility impossible. This was first discovered in #182.

While a simple global reduction over the elapsed time during the tuning can help in synchronous runs it will cause hangs when using asynchronous algorithms like DD where each GPU works on a local problem and may not even launch the tuning process for a specific Kernel.

This then also relates to the issue mentioned in tune.cpp

//FIXME: We should really check to see if any nodes have tuned a kernel that was not also tuned on node 0, since as things
//       stand, the corresponding launch parameters would never get cached to disk in this situation.  This will come up if we
//       ever support different sub volumes per GPU (as might be convenient for lattice volumes that don't divide evenly).

We need a non blocking solution to that.

@weinbe2
Copy link
Contributor

weinbe2 commented Sep 3, 2024

Updating: this has been addressed in the non-DD case, but is still relevant for DD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants