-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Establish conventions to enable fully CUDA-less builds #823
Comments
cc @wphicks @divyegala @mhoemmen @teju85 for thoughts and ideas here. |
@cjnolet I would prefer being able to spell algorithms the same way regardless of where they run, and telling them whether to run on host or device based on an execution policy parameter. This would align with the C++ Standard, as well as other NVIDIA products like Thrust. "Spell the same way" means same function name, same namespace, but different overloads. Ultimately those overloads need to call implementation details that might depend on whether a compiler or library (e.g., CUDA runtime) is available. The library question can be handled by the same implementation strategies that C++ BLAS wrappers use. The compiler question is a bit harder. I could see users opting into this explicitly by including headers. By analogy, users need to include |
@mhoemmen I should have been more clear in my description of this problem. Just to clarify- I am not suggesting we should use It might help if I back up for a moment and provide some background- when we first started building these primitives, we never even considered the question "Can I build without CUDA even being installed on my system?". As a result, we've created a lot of headers which are spread out across the library and which all assume at least CUDA-runtime is installed and available. Some of the headers ( There are severl headers spread across the library that don't actually require CUDA-runtime: There just doesn't seem to be a good way today to tell users "Here's the list of RAFT headers which are safe to use in a public API and which don't require CUDA-runtime to be installed at all". @teju85 started adding |
I think this is probably worth an offline discussion, but I'll go ahead and give a brief description of how I'm handling this in the FIL CPU-GPU interop work. There, I distinguish between "consumable" and "implementation" headers, which is a little different than "public" and "detail." Consumable headers may still not be part of the public API, but they are intended to be consumed by other code, whereas implementation headers should never be As an example, here is a consumable header which is used for copying both CPU and GPU memory chunks. Note that it is within a Within the consumable header, we conditionally include the GPU implementation header based on whether or not the compile flag I don't know that we would want to adopt this across all of RAFT. There are some parts of RAFT which just don't make sense without access to the CUDA runtime API, and there the additional layer of indirection may not offer anything useful. The nice thing about this mechanism though is that if we decide that a particular header does need to provide CPU/GPU interop later, then the existing header just becomes the "consumable" header, and its functionality can be split out into implementation headers without requiring any downstream or internal changes whatsoever. So, we could progressively apply this mechanism as needed, organically growing the functionality that is available with or without CUDA. |
Thanks @wphicks! The As for the larger discussion about RAFT's conventions for documenting and making the existing non-CUDA compilable files easier to find, I think some solutions are becoming more clear as I've been working through this PR to move stuff around for 22.10. It's a draft at the moment but I've been attempting to document them in that PR as they become more clear. |
Fantastic! This sounds like a great improvement and well-aligned with other CPU-GPU interop work.
The only thing I'd want us to think about a little bit is ensuring that this does not lock us into a design that requires that we know whether data is on device/host. One key element that can substantially simplify parts of our design is allowing some code to be agnostic to that. It may be that the determination of device/host was made upstream of a particular function call or that we have objects that fully encapsulate device-specific behavior, but in both cases it can be useful to explicitly say "We don't care!" when it comes to the memory type. In general, that shouldn't be an issue, but we should be wary of interop mechanisms that assume that we must rely on overloads to make that distinction. |
This issue has been labeled |
Currently there are 3 types of build environments in which a user might use RAFT headers:
gcc
&nvcc
gcc
&!nvcc
gcc
&!nvcc
For the most part, RAFT supports the first two build environments currently and provides separate
hpp
andcuh
header extensions to denote code which only expects CUDA-runtime (hpp
) from code which also needs to be compiled w/nvcc
(cuh
).There is a need to support the third item for Triton workloads and it's currently a challenge for users to determine which headers don't require cuda runtime. It's even challenging for us, as the raft developers, to list said files and explain some sort of coherent rules behind how to find them to users.
We should start thinking about what these conventions might look like. For example, should we go a step further than just separating the header extensions and also separate the namespaces to denote cuda-runtime from non-CUDA? (e.g.
raft::core::runtime
/raft::distance::runtime
vsraft::core::device
/raft::core::host
?).This issue also relates to #806.
The text was updated successfully, but these errors were encountered: