Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsupported getri_batch/getrf_batch for Nvidia #229

Closed
Soujanyajanga opened this issue Sep 22, 2022 · 9 comments
Closed

unsupported getri_batch/getrf_batch for Nvidia #229

Soujanyajanga opened this issue Sep 22, 2022 · 9 comments
Assignees
Labels
help wanted Tasks, issues or features that could be implemented and contributed to the project

Comments

@Soujanyajanga
Copy link

Soujanyajanga commented Sep 22, 2022

As per the oneMKL-LAPACK

APIs "getri_batch/getrf_batch" are not implemented for Nvidia.

void geqrf_batch(sycl::queue &queue, std::int64_t m, std::int64_t n,
sycl::buffer<std::complex> &a, std::int64_t lda, std::int64_t stride_a,
sycl::buffer<std::complex> &tau, std::int64_t stride_tau,
std::int64_t batch_size, sycl::buffer<std::complex> &scratchpad,
std::int64_t scratchpad_size)
{ throw unimplemented("lapack", "geqrf_batch"); >>>>>>>>> Unimplemented }

void getri_batch(sycl::queue &queue, std::int64_t n, sycl::buffer &a, std::int64_t lda,
std::int64_t stride_a, sycl::bufferstd::int64_t &ipiv, std::int64_t stride_ipiv,
std::int64_t batch_size, sycl::buffer &scratchpad,
std::int64_t scratchpad_size) {
throw unimplemented("lapack", "getri_batch"); >>>>>>>>>> Unimplemented

Can you please let us know when these APIs support will be available?

We have implemented work around by using "SYCL interop", will you be interested in this?

@AerialMantis
Copy link

AerialMantis commented Sep 27, 2022

Hi @Soujanyajanga, thanks for raising this issue, I can give you an update on the progress of these operations for the Nvidia backend.

For getrf_batch Nvidia supports an equivalent to getrf but not to getrf_batch so to support this we need to implement it by manually batching the regular getrf implementation, and there is a pull request open just now which does this - #209.

For getri_batch, however, Nvidia does have an equivalent to getri_batch but it's provided in cuBLAS rather than cuSOLVER, which means this would require some changes to the Nvidia backend. Unfortunately we don't have any immediate plans to do this, however, we could incorporate this into our future roadmap.

Edit: I originally stated that getri_batch was not provided by Nvidia, but it is in fact provided, but it's in cuBLAS rather than cuSOLVER.

@AidanBeltonS
Copy link
Contributor

AidanBeltonS commented Sep 27, 2022

We have implemented work around by using "SYCL interop", will you be interested in this?

I have a quick question, what native function have you been using as your work around? So far as I am aware getri does not have a native cuSolver equivalent.

@AidanBeltonS
Copy link
Contributor

I have managed to answer my own question, cuSolver does not implement getri cuBlas does.
https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-getribatched

I think this is something we can support but I think it may take a bit of additional work in the backend to get the appropriate cuBlas handles, etc.

@Soujanyajanga
Copy link
Author

We have implemented work around by using "SYCL interop", will you be interested in this?

I have a quick question, what native function have you been using as your work around? So far as I am aware getri does not have a native cuSolver equivalent.

For "getri_batch", CUDA equivalent API is "cublasCgetriBatched" .
We have integrated CUDA APIs with SYCL interop as a workaround.
Are you interested, in this approach?

@AerialMantis
Copy link

@Soujanyajanga yes, I think this is the approach we would take, if you can share your workaround this could be useful, thanks, I've added this to our roadmap so someone will take a look at.

@Soujanyajanga
Copy link
Author

@Soujanyajanga yes, I think this is the approach we would take, if you can share your workaround this could be useful, thanks, I've added this to our roadmap so someone will take a look at.

Here below the work around implemented using SYCL interop
static sycl::queue *handle;
error = (handle = &dpct::get_default_queue(), 0);

……………..(creating/adjusting the parameters for CUDA API)
……………...
………………

   cublasStatus_t err1;
   cublasHandle_t handle_cuda1;
   CUstream streamId1 = sycl::get_native<sycl::backend::cuda>(*handle);
   err = cublasCreate(&handle_cuda1);
   err = cublasSetStream(handle_cuda1, streamId1);
   err = cublasCgetriBatched(handle_cuda, n, (cuFloatComplex **)A_array, n, dipiv, (cuFloatComplex **)Ainv_array, n, dinfo_array, batch);

@mkrainiuk mkrainiuk added the help wanted Tasks, issues or features that could be implemented and contributed to the project label Sep 4, 2024
@JackAKirk
Copy link
Contributor

@hdelan should this issue be closed?

@hdelan
Copy link
Contributor

hdelan commented Oct 3, 2024

@JackAKirk I think so. I don't have the permissions to close the issue but maybe @Rbiessy can

@Rbiessy
Copy link
Contributor

Rbiessy commented Oct 3, 2024

Thanks for catching this. Looks like all the issues mentioned have been addressed so closing.

@Rbiessy Rbiessy closed this as completed Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Tasks, issues or features that could be implemented and contributed to the project
Projects
None yet
Development

No branches or pull requests

9 participants