-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Aggressive Vectorization Compilation mode #72
Comments
We discussed that I'm nervous about potential errors this may bring to code which will be hard to find and fix, particularly in the initial ports of code to use Kokkos. Its something for us to consider but we should prioritize user experience right now IMHO. |
Yeah I know ;-). I only want to open an issue to track the issue, and give a space to leave comments. I got some feedback from the Fluid Dynamics folks who liked the idea of getting that mode (which is not on by default). Btw. I would be interested in an example where it actually fails, so far I couldn't contrive one (which does not mean that I believe that there is none ...). |
Are you asking for an example which is runnable on the GPU under "Kokkos rules" or code which you are actually going to get in the OpenMP-to-Kokkos transition phase? These two things are not born equal. |
I was more thinking about in the OpenMP-to-Kokkos transition phase, but if you have an example which actually runs correct with the Cuda backend and fails with OpenMP backend while marking loops with ivdep that would be even more interesting. |
After discussions today at the Kokkos meeting I pushed this change in. To enable it either define KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION to 1 or when building with the Makefile system use KOKKOS_OPTIONS=aggressive_vectorization as an option. There is no CMake option right now. It also only affects the OpenMP backend with the Intel compiler currently. |
Since not all functors will be vectorizable like that, perhaps it's better to add another execution space, |
In principal Intels interpretation of #pragma ivdep can always be added to Kokkos code, since our parallel_for kernels have to be outer loop vectorizable, and intels #pragma ivdep only applies to the immediately following block, and not to nested blocks. That said: this is the current compiler, and the behavior is ill defined, so we don't want to add that by default. This compiler option would add the #pragma ivdep and thus help considerably with vectorization.
The text was updated successfully, but these errors were encountered: