-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] All primitives should be hiding implementation details and exposing only public API #330
Comments
@cjnolet thanks for bringing this up. I agree with each and every proposed point here! The only thing that I would like to do differently is that I would like to specify a standard contract, as you mention, for every RAFT public primitive that we have right now. This would involve things like (and likely more):
Once we have a contract ready for all existing prims, we can enforce it for new prims. But going backwards would mean having to refactor for all prims that come in at the same time. Thus, I propose that we implement all your proposals for existing primitives first, and let that be the standard that needs to be met for any new primitive that comes into RAFT. |
Just to provide some background on this issue, I started working on #314 and realized as I was going through the files in the codebase that in many instances I was having trouble determining which functions were actually prims and which were internal, such as helper functions. In fact, I'm familiar with most of these prims and I know that if I'm having this trouble then consumers who are unfamiliar with them are going to be having an even tougher time. I definitely agree we should do the things you are proposing as well, but I think starting by establishing which things are actually public prims will make it even easier to do the things you are proposing. In just about all cases, I've managed to be able to do this for the prims being consumed by cuml, and since we've already been using the With risk of the scope creeping up further than it needs to for this, what I'm proposing here can also be done right now within raft itself as a first step, without breaking any existing APIs or causing any updates to consumers downstream. This should make updating public APIs much easier, since it would actually establish which things are, in fact, part of the public APIs. From my perspective, this doesn't remove anything nor break anything, and it only adds clarity to what's already here. Do you mind elaborating on your number 3? |
@cjnolet you are right, we should definitely not extend the scope of this issue. I would also love to just separate our public and detail APIs first as that makes it easier to change in the future. I am only worried about not having a consistent public API before exposing public primitives, and then going ahead and breaking that API. As for number 3, many of our primitives have |
I wholeheartedly agree with the concept here. I know this is not related to just this issue but we need to start treating the public APIs across all of RAPIDS as if they are actually used by others and change them with great care (e.g. adding new methods and deprecating old methods rather than arbitrarily changing method signatures in a destructive way). Our own internal usage has gotten complex enough to need this, and we're starting to see others using our components. The approach is very reasonable, although I would say that we should try and put some urgency on actually making these changes. Perhaps something like:
|
That timeline looks very reasonable to me and I agree that we should use a deprecation strategy to update the public APIs, especially now that RAFT is being used by multiple different libraries. Again, not wanting to stretch the scope of the public API discussion too far, but I'm seeing the last two items in your list as additions to the public API, rather updates, for the purposes of maintaining backwards compatibility. Perhaps we could choose a version (maybe 22.06?) to remove the original public API functions. So maybe something like:
(you might have been implying this in your original list but I'm just hoping to make it more explicit) |
@cjnolet @ChuckHastings just to follow for clarity, what do you both mean by |
Since hiding the private bits is straightforward and doesn't break any current API comptaibility, I was taking it to mean that we would be making sure new prims do this. Ideally, they could also do some of the API and consistency cleanup, like removing the explicit What I like about the proposed deprecation strategy is that it can be done as an intermediate step and gives consumers an opportunity to work the API updates into their timelines. That way we're not having to orchestrate multiple projects to do everything all at once. |
@divyegala You corrections are what I implied as was (in retrospect) ambiguous about. What I mean by "all new prims doing this in 21.12" is that any new primitives created starting in version 21.12 or anytime after that would use the new paradigm for hiding implementation details. We should make checking for hiding the implementation details part of the criteria of approving a new primitive. |
Adding a checklist so that we can tackle this in several smaller PRs rather than one monstrous PR:
The goal here is for any new primitives to expose a header file with the public API which is invoking a the private api in the detail namespace. The public API should all be accessible through |
This issue has been labeled |
This issue has been labeled |
Just updating this thread as it's now been a couple releases. PRs have been opened for the remaining bits and it looks like this issue should be able to be closed by 22.04. |
Addresses #330 Authors: - Divye Gala (https://github.com/divyegala) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Corey J. Nolet (https://github.com/cjnolet) URL: #383
@cjnolet @divyegala I haven't come across this issue and the associated PRs before, but now that PR #383 has been merged, it has hit me. I'll explain the issue a bit more generally:
I can understand the need to separate some internal implementation details within RAFT from the What has happened specifically: with PR #383 So what would be nicer is to make it a bit clearer what RAFT actually provides / what each part of RAFT provides:
Another example where this is problematic: It would be nice to somehow clarify what RAFT or its parts actually provide within the README or doxygen docs, so that these issues can be avoided in the future. |
To echo further on @MatthiasKohl's second half of the comment... In cuML, we try to follow this naming convention of headers to clearly tell the users whether a header file is include-able from a .cpp or should it be a .cu. For eg: after the recent refactor, the I'm NOT saying that we shouldn't hide implementation details inside a |
TLDR; RAFT is supposed to be all three things you listed, however it never occured to me that folks would want to compile it without a CUDA-enabled compiler. The purpose of hiding implementation details here is to make it very explicit which APIs can be invoked by users so that we can keep them lightweight, flexible and most importantly, stable. When we had originally moved the primitives over from cuml, there were device functions in files w/ host functions, some of the device functions were being invoked externally to the files and it was very hard for users to determine which functions were intended to be invoked just by looking at the code itself. By exposing the public API as thin wrappers around the implementations, we more clearly separate the two and make very obvious which functions we intend to keep stable. With that stability also comes the opportunity for more consistency across the public APIs, and we're planning to make heavy use of the I recently pulled the cusparse macros into a separate file called |
@teju85 @MatthiasKohl The more I've thought about this, the more I've realized that I've been operating under the assumption a CUDA-enabled compiler will always be used to build RAFT's primitives. Since there's so many custom kernels and things like Thrust and CUB in our code, it never even occurred to me that we would want to compile some of the code in If we want to provide the ability for some of RAFT's APIs to be built without a CUDA-enabled compiler then it could make sense to use the filename extensions to separate the two. We might want to think about it a little more, though- my real hesitation here is that when I'm in the process of building an algorithm, I'm going to constantly have to be looking at the docs to see which extension to use for different primitives. Since we know all the files can be compiled w/ @harrism, I recall there might be a performance implication here but I'm having trouble with the details. Does |
@cjnolet Thank you for the detailed answers here! To add a bit more details about the non-CUDA-enabled compiler:
The latter 3 of those core components can all be compiled with a non-CUDA-enabled compiler and I think that this is a good thing, since it simply allows more flexibility and it allows to use RAFT for projects that only interact with the CUDA runtime from host, without any device functions. I believe that More importantly, I think that it makes sense to clearly define the scope of each header or section/folder in RAFT because it hints users as to what they can expect from that header or folder, and this is where the file extension can help additionally. For example, regardless of the compiler issue, I assumed that the cublas macros are useful in any project using cublas, since they are similar to By separating the scope clearly, being able to use a non-CUDA-enabled compiler for large parts of RAFT will be a side-effect rather than a stated goal, and I'm not asking for RAFT to be compiled by a non-CUDA-enabled compiler as much as possible, but to clearly state and separate the scope of headers / folders. |
I don't think so. I think what you are thinking of is that NVCC is slow, period. Therefore, in libcudf, for example, we avoid .cu files when we don't need them. For example if there are no This is one of the advantages of using We aim for most libcudf tests and benchmarks to be in .cpp files (since the libcudf APIs they test are not header-only). |
I've opened #524 to start addressing the issues w/ the header file extensions. |
While the primitives in cuml only needed to worry about cuml algorithms as consumers, hiding implementation details wasn't particularly important. However, now that the primitives in raft are beginning to find more consumers downstream, we should start doing this across the board for all primitives. There are several benefits to doing this an the only drawback that I can think of is that it's going to require an initial refactor to clean up the current primitives (which I propose we do incrementally)
Benefits:
hpp
, which puts less burden on consumers to know which one to use. I think we can maintain backwards compatibility with this for awhile, maybe with a warning, and then cut out thecuh
files from the public API altogether at some point.Tagging the usual suspects for thoughts. @teju85 @divyegala @ChuckHastings @tfeher @wphicks My proposal would be that we start doing this for all new prims as of 21.12 and then iteratively move the implementation details for existing prims (note that we should be able to do this initially without breaking the public APIs at all, so long as the consumers are all actually using public APIs downstream).
The text was updated successfully, but these errors were encountered: