Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] cuML to expose a "proper" CUDA API #92

Closed
teju85 opened this issue Jan 14, 2019 · 13 comments
Closed

[FEA] cuML to expose a "proper" CUDA API #92

teju85 opened this issue Jan 14, 2019 · 13 comments
Labels
feature request New feature or request

Comments

@teju85
Copy link
Member

teju85 commented Jan 14, 2019

Is your feature request related to a problem? Please describe.
We currently are not exposing the following things from our C/C++ API:

  1. cudaStream
  2. cublasHandle_t and cusolverDnHandle_t
  3. custom memory allocators

The advantages of doing these are:

  1. performance
  2. tighter control over job scheduling and resource allocation from the wrapping library itself
  3. we'll be in unison with other cuda libraries. So, lesser ramp-up curve for our users

Describe the solution you'd like
One solution can be to:

  1. expose a cumlHandle_t structure (just like cudnn/cublas/cufft/cusolver).
  2. give users an ability to set and get above handles/streams/allocators.
  3. make all of our exposed methods in cuML to accept this object.

Describe alternatives you've considered
There are no alternatives currently.

Additional context
None.

Note
Just like #77 , I'm mostly filing this issue so that it doesn't slip away. Please feel free to set the priority for this accordingly, @datametrician @dantegd .

@teju85 teju85 added ? - Needs Triage Need team to review and classify feature request New feature or request labels Jan 14, 2019
@teju85
Copy link
Member Author

teju85 commented Feb 5, 2019

Some more concrete descriptions of what needs to be done...

  1. Define a cumlError_t enum and a corresponding cumlGetErrorString method.

  2. Define a cumlHandle_t struct
    Users are expected NOT to depend on the internals of this structure! Example definition could be:

struct cumlHandle {
  cublasHandle_t cublas;
  cusolverDnHandle_t cusolverDn;
  DeviceAllocator alloc;
};
typedef cumlHandle_t cumlHandle*;
  1. Define interfaces on this struct, the usual ones just like the others (they must be "extern C"d, obviously)
cumlError_t cumlCreateHandle(cumlHandle_t*);
cumlError_t cumlDestroyHandle(cumlHandle_t);
cumlError_t cumlSetStream(cudaStream_t); // should set the same stream for both cublas and cusolverDn, atleast for now!
cumlError_t cumlGetStream(cudaStream_t*);
cumlError_t cumlSetAllocator(AllocFunctor, DeallocFunctor);
// this allocator is supposed to be used internally in cuml and/or ml-prims for working with temporary workspaces.
  1. Expose these interface in cython world as well.

  2. As a POC, pick one of the existing algos: pca, tsvd or dbscan and update their interface to accept cumlHandle_t

Note: the 'Allocator' thingy can only work after the PR #167 is done!

Vinay Deshpande (@vinaydes) will be helping us here as a "starter" task for his onboarding on cuml.

@oyilmaz-nvidia
Copy link
Contributor

@teju85 Exposing library handlers, streams, etc is a pretty good idea. We should definitely include this in the coming versions.

@oyilmaz-nvidia
Copy link
Contributor

@teju85 Maybe, we might not need the cumlAlloc or its variations because we use cuDF for this kind of operations.

@teju85
Copy link
Member Author

teju85 commented Feb 6, 2019

I see your point regarding alloc/free functions. Those were added in order to be used by cuml and/or ml-prims, where the function needs temporary allocations.

Your point makes sense. Let's not expose these 2 methods, instead, have the custom allocator being used totally internally inside cuml and ml-prims.

@teju85
Copy link
Member Author

teju85 commented Feb 6, 2019

@oyilmaz-nvidia @vinaydes Updated the interface proposal above based on the above feedback.

@cjnolet
Copy link
Member

cjnolet commented Feb 6, 2019

I was going to recommend that we have some global handle for being able to track workspace allocations. I'm 100% for this.

@teju85
Copy link
Member Author

teju85 commented Feb 6, 2019

@cjnolet Let's try to avoid such global vars as much as possible.

@cjnolet
Copy link
Member

cjnolet commented Feb 6, 2019

I misread the description on this issue. I was thinking along the lines of #186, which would also be good to standardize in the CUDA API.

@teju85
Copy link
Member Author

teju85 commented Feb 6, 2019

Agree @cjnolet . I just tagged you on that issue and mentioned the same thing. This allocator being discussed here should simplify workspace allocation logic by a lot.

@dantegd
Copy link
Member

dantegd commented Feb 6, 2019

@teju85 regarding the allocation, the idea works perfectly, and also allows us to start using RMM for the allocations that cuDF doesn’t handle. @oyilmaz-nvidia In fact we don’t always depend on cuDF for allocation, our python apis accept host numpy arrays that are then transferred to the GPU with numba, which will change to use RMM.

I also absolutely love the proposed cuml_handle. On the other hand, regarding error handling, since python is our main end user interface for the time being, exceptions are going to give us a more robust error reporting infrastructure for that, and in general we are not limiting cuML to be a C api (so the c api can be a wrapper around the c++, which it currently mostly is), so at least at the c++ level we can raise the exceptions. Which is what RMM and cuDF are moving to soon. (Hope I was clear in that explanation, it’s early here)

@dantegd
Copy link
Member

dantegd commented Feb 6, 2019

@teju85 now that its a little later in the day I just wanted to clarify. In my comment above, I am trying to say that I favor both having the error code and throwing exceptions. Languages that can link to C++ libraries great the benefit of exceptions, languages without C++ support only get the cuml_error_t support.

@teju85
Copy link
Member Author

teju85 commented Feb 12, 2019

Fair enough. Makes sense @dantegd

@jirikraus and @vinaydes, there's a request to add random_state as a way to repro the ML algos in issue #207 . I was thinking about creating a cumlError_t cumlSetSeed(uint64_t); method to the proposed list as well. What do you folks think about it?

@jirikraus
Copy link
Contributor

Just explicitly adding a reference to the PR addressing this. #247

jirikraus added a commit to jirikraus/cuml that referenced this issue Mar 8, 2019
dantegd added a commit that referenced this issue Mar 12, 2019
[REVIEW] Proposal for "proper" C/C++ API (issue #92)
@dantegd dantegd closed this as completed May 10, 2019
Salonijain27 pushed a commit to Salonijain27/cuml that referenced this issue Jan 22, 2020
[REVIEW] No random dataset for tsvd and pca algorithm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants