Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARSE] Add support for cuSPARSE backend #527

Merged
merged 44 commits into from
Oct 29, 2024

Conversation

Rbiessy
Copy link
Contributor

@Rbiessy Rbiessy commented Jul 4, 2024

Description

Add support for the cuSPARSE backend.

Fixes #25

Depends on #500. All of the changes specific to cuSPARSE are in the last commit.

Rendered documentation: docs.zip

Checklist

All Submissions

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Aug 29, 2024

I have pushed the changes to merge with the develop branch, apply relevant feedback from the previous sparse PR and worked on adding support for the enqueue_native_command extension.
I have tested this branch with 2024.2 + CUDA 12.5 (not using the extension): log_oneapi_2024_2_cuda_12_5.txt
and a 2025.0 RC + CUDA 12.2: log_oneapi_2025_0_cuda_12_2.txt

@gajanan-choudhary gajanan-choudhary added feature A request to add a new feature backend A request to enable new implementation behind API labels Sep 10, 2024
@Rbiessy
Copy link
Contributor Author

Rbiessy commented Oct 7, 2024

FYI we're holding on this PR for a little bit. We have found some tests failing when running on A100, in particular if the machine is busy or multiple tests are run in parallel. There seem to be race conditions which we don't understand yet.

@spencerpatty
Copy link

Do we need to test with more modern version of CUDA libs, for instance, oneMath Interfaces project currently tests with 11.8 and 12.0 SDKs, but do we need latest SDK (12.6) to also be tested? Are there features of 12.6 that we exploit that are different from say 12.0 ?

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Oct 15, 2024

Do we need to test with more modern version of CUDA libs, for instance, oneMath Interfaces project currently tests with 11.8 and 12.0 SDKs, but do we need latest SDK (12.6) to also be tested? Are there features of 12.6 that we exploit that are different from say 12.0 ?

The oneAPI Core Team at Codeplay is testing oneMKL Interface with our CUDA plugins and CUDA 12.4. I am personally using 12.5 for the test logs attached here.

Note that currently the minimum CUDA version set to use the cuSPARSE backend is CUDA 12.2, see the related discussion here: https://github.com/oneapi-src/oneMKL/pull/527/files#r1750927886. The internal CI would need to be bumped to that version at least to test cuSPARSE!
There is an important feature in CUDA 12.4 to enable cusparseSpMV_preprocess. Other than this using more recent CUDA versions mostly fixes some issues for the features that we support.

@Rbiessy Rbiessy requested review from a team as code owners October 25, 2024 13:30
@Rbiessy
Copy link
Contributor Author

Rbiessy commented Oct 25, 2024

@oneapi-src/onemkl-sparse-write I have updated the PR with fixes needed and conflicts fixed. To summarize:

  • The main issue we have observed is that cuSPARSE seems to require that the same CUStream is used across all the steps of an operation. This worked well with @gajanan-choudhary's suggestion to cache CUStream and cusparseHandle so this is now done in c5ba2c4, 1f5c80c and most importantly 58f08c9
    • This had more consequences with the extension used in 2025.0 RC which are fixed in 956ae49 and 43428ca
  • We also found some dependencies issues when running multiple tests in parallel which are fixed in c0eae1e, 2149e39 and bad6bfb. The last commit was also needed for rocSPARSE. It seems the backends need the data to be available on the device before the optimize step is run. I've not seen this well documented in the backend documentations. I'll look into clarifying this in our specification. We clarified this in our spec [oneMKL][spblas] Restrict features not supported by any backends oneAPI-spec#542 (comment).
  • We hit a cuSPARSE bug which is now workaround in 6318d53 (more details in the link added in this commit)
  • I found some issues with the CT example, fixed in 2888232 and the README output was updated in b4f553c
  • 97eef5c is a small improvement to clarify which symbols are part of the public library and which are implementation details.

New test logs:

@gajanan-choudhary let me know if you have any concerns with these changes.

Copy link
Contributor

@gajanan-choudhary gajanan-choudhary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes look fine to me. Feel free to merge in. Thanks for the great work, I'm sure this PR took monumental effort and patience on your end!

@Rbiessy Rbiessy merged commit c8dc9a9 into uxlfoundation:develop Oct 29, 2024
9 checks passed
@Rbiessy
Copy link
Contributor Author

Rbiessy commented Oct 29, 2024

Thanks all for the reviews!

@Rbiessy Rbiessy deleted the romain/cusparse branch October 29, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend A request to enable new implementation behind API feature A request to add a new feature Sparse BLAS domain Sparse BLAS domain issue/request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CUDA] Add support for cuSparse
5 participants