You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I started testing CHIP-SPV with rocPRIM. It built pretty easily with some minor fixes, but then I encountered the need for device side function pointers and passing function pointers as arguments to kernel functions. This is the basic pattern of templated algorithm/pattern skeletons where the actual applied function is passed as an argument. We can attempt to implement the function pointers by using the Intel extension. It seems to already generate code that uses it for the device part, but what seems to be missing is the host side function address taking functionality for device side functions so we can pass correct values from the host to the device side.
A slightly more portable (but incomplete) implementation for this particular case is to see if we can utilize SPIR-V specialization and specialize the call with the function pointer(s) passed to it and rely on constant propagation to remove the indirection. I used a similar concept in this experiment.
The text was updated successfully, but these errors were encountered:
There was a misunderstanding from my part: CUDA/HIP doesn't support "univeral function pointers" (like I assumed would be the semantics with __host__ __device__ functions). Indirect calls should work only on pointers taken at device side.
Thus, likely rocPRIM does away with all the functor cases with template specialization converting them to direct calls to device side functions.
Keeping this open to track the __device__ side fptr implementation still.
I started testing CHIP-SPV with rocPRIM. It built pretty easily with some minor fixes, but then I encountered the need for device side function pointers and passing function pointers as arguments to kernel functions. This is the basic pattern of templated algorithm/pattern skeletons where the actual applied function is passed as an argument. We can attempt to implement the function pointers by using the Intel extension. It seems to already generate code that uses it for the device part, but what seems to be missing is the host side function address taking functionality for device side functions so we can pass correct values from the host to the device side.
A slightly more portable (but incomplete) implementation for this particular case is to see if we can utilize SPIR-V specialization and specialize the call with the function pointer(s) passed to it and rely on constant propagation to remove the indirection. I used a similar concept in this experiment.
The text was updated successfully, but these errors were encountered: