-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomics with Serial Backend - Default should be Disable? #549
Comments
This would only be viable for a Serial-only configuration. Otherwise detected whether within a parallel region or not would swamp the cost. |
@hcedwar I think you tell people if they are using serial backend we won't compile thread safe atomics etc in there unless they use the override. There are definitely Trilinos builds which are serial-only. |
What if people use Kokkos::Serial in thread-parallel tasks? |
What Carter meant was that if we enable both OpenMP and Serial, atomics are still enabled. |
Would it make sense for Kokkos::atomic_* to be functions of the execution space? Users would write, for example,
or for an execution space instance
|
This would deviate way to much from the ISO/C++ standard, and from where the C++20 standard is headed for allowing atomic operations on non- std::atomic types. Best we can do is when Kokkos is configured with ONLY the Kokkos::Serial back-end that the atomic implementations decay to non-atomic operations. This is still high-risk because an application may be using pthreads or OpenMP unknown to Kokkos, then use Kokkos atomics, then get silent data corruption, and then blame Kokkos. Would be best to have a Kokkos configuration option that says each program process is guaranteed to be internally serial (a.k.a., MPI only for parallelism) so atomic operations should decay to non-atomic operations. Perhaps, KOKKOS_USING_SERIAL_PROCESS |
Yeah this is what I was thinking as well. And I agree with Carter that the default should be safe. So the option is to turn of atomics if and only if Serial is the only configured backend. |
Can we make that the default if only serial is configured so that all the MPI only codes running at the lab (which is quite a few right now) benefit? |
So you think the scenario Carter brought up and I talked about earlier is unlikely enough to make it worth the risk? |
Internal apps might very well enable OpenMP (the "one configuration to rule them all" approach), so in practice, it might be harder than this to turn off atomic updates. |
In general I think we should try to remove the overheads for MPI only configurations as much as possible. We are paying thousands of cycles which are not needed at all in the common (MPI only) case. My gut feeling is to say that if you enable Serial we make no guarantees unless you provide an override to force us to compile thread aware operations. |
@nmhamster wrote:
I agree :-) The issue is that the internal apps that want MPI only to perform well, may actually turn on OpenMP or Pthreads in Kokkos, even though they may not be using it. I'm not sure if we need to worry about that case, though. If they bother to turn on OpenMP, it's their fault if they don't use it. I don't think Kokkos can reasonably figure out globally, at configuration or compile time, whether or not users actually use an enabled execution space. |
Is there any way to disable atomics currently ? If not it would be nice to have the option, though I don't know about defaults. In my case, I get Valgrind false positives from Kokkos atomic assign operations in Tpetra in serial runs (it compares an uninitialized value to itself). Being able to compile without atomics would make debugging easier. |
@ibaned wrote:
If Kokkos::Serial is Tpetra's execution space, Tpetra does not use atomic_assign by default. Thus, these warnings are truly spurious. If Kokkos::OpenMP is Tpetra's execution space, but OpenMP is only using 1 thread, then Tpetra does use atomic_assign by default. |
One avenue for pursuing issue #607 |
Right now in LAMMPS and SPARTA we check if there is only a single thread and no GPUs, and if so we template out all the Kokkos atomics manually, which is a huge pain, but has to be done for performance. I wish Kokkos provided a more convenient way to do this and atomics were automatically disabled for the Serial backend. |
Moving this out of Backlog and onto an upcoming deadline will put in on our radar and get it considered again. October release is happening now, so its on the December list. |
So, the VPIC code would also like this feature. |
It still confuses me why atomic operations are not a function on / of the execution space, but whatever ;-P. |
Since the functions don't take any parameters indicating the execution space, I think they don't have a fast way to determine what space they're in which would prevent this. I suppose if they were templated on the execution space this would be trivial to implement. |
@ibaned There's also no way to implement an MPI 3 RDMA back-end without atomics being a function / templated on the execution space. Even if the standard doesn't go that way, I wish Kokkos would. The two least idiomatic parts of Kokkos for me are the C++ right now doesn't have a way to customize function attributes (e.g., to make them a function of a template parameter), so we're stuck with the macros. However, we could make atomics take an execution or memory space instance argument (that would cover the MPI RDMA case, among others). |
Yep, this is unfortunate indeed.
I see no issue with doing the following: change atomic functions to take execution space in some way, and create backwards compatibility wrappers that call |
@ibaned I would be happy to add the atomics overloads myself, if OK with the Kokkos folks. |
[kokkos#607] [kokkos#549] A few details: - Accepting volatile pointers was necessary for compatibility with existing calls which pass in volatile pointers, hence the const_cast - Special implementations of atomic_increment were needed to get equal performance in the one application I tested (it was doing its own serial special cases before). - Compilers have a harder time matching templates as opposed to overloads, so some call sites had to be modified to specify the scalar type explicitly
I've just implemented this, will open a PR shortly. |
I understand some of the background to why we have atomics enabled for serial backend but I think the default should be that atomic operations are disabled and result in standard load/stores when serial backend is used. This would assume a Kokkos CMake/make option to re-enable if needed. Given that a good number of codes are using the serial backend on MPI-only platforms still this seems an overhead which is not needed or necessary. Removing this would enable more optimization and possibly vectorization for this case.
The text was updated successfully, but these errors were encountered: