-
Notifications
You must be signed in to change notification settings - Fork 0
Batched Kokkos
Goal here is to port miniqmc_sync_move to a Kokkos friendly version, i.e. using Kokkos data structures and multilevel parallelism. This page will document some of the challenges, solutions, and gotchas associated with working with Kokkos.
Challenges:
- To construct Kokkos::View<T*>, T has to obey the following constraints (from the Kokkos wiki):
- T must have a default constructor and destructor.
- T must not have virtual methods.
- T's default constructor and destructor must not allocate or deallocate data, and must be thread safe.
- T's assignment operators as well as its default constructor and deconstructor must be marked with the KOKKOS_INLINE_FUNCTION or KOKKOS_FUNCTION macro.
This means we can't do a straightforward port of miniqmc_sync_move without significant redesign of the main objects. For example, Mover doesn't have a default constructor, and even if it did, build_els does dynamic allocation. This seems to be a "feature" of many of the objects in miniqmc and qmcpack.
- A corollary of the above constraints is that T cannot contain managed Views. This means that the current view based solution for asynchronous moves will probably not work out of the box.
Gotchas:
- View allocation needs to happen outside of Kokkos::parallel regions. It's sort of obvious in hindsight, since allowing this would be disastrous for performance.
- Kokkos allows for view slicing and generation of subviews. This is really spiffy, but our reliance on raw pointers (see spline for example) assumes a data layout. Placing the data in a view, constructing a subview, and passing this pointer into C style functions can be a disaster when the View starts changing its layout based on the architecture.
Instead of following the batched move algorithms laid out in miniqmc_sync_move, we are going to take a different approach. Basically, we will elevate the WaveFunction object to be a batched evaluation object and expand the associated data structures to accomodate an additional walker index. This accomplishes the following things:
- Transition to multiwalker Views amounts to adding an extra dimension to each view. This is a lot better than having Views of complicated data types... e.g. a View of WaveFunctions, where we have to worry about how this data type gets dispatched.
- Pseudopotential evaluation in check_spo_multi disabled. This means that evaluate_v is not tested.