Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda : refactor into multiple files #6269

Merged
merged 5 commits into from
Mar 25, 2024
Merged

cuda : refactor into multiple files #6269

merged 5 commits into from
Mar 25, 2024

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Mar 24, 2024

The main goal is to make it easier to work on the CUDA backend and improve compilation times.

@ggerganov ggerganov added the high priority Very important issue label Mar 24, 2024
@slaren
Copy link
Collaborator Author

slaren commented Mar 24, 2024

If you have any suggestions to improve the organization of the files, the naming scheme or anything else, please let me know, I am sure it could be improved.

Other than that, the only thing left to do is fix the HIP and Makefile builds.

@slaren slaren marked this pull request as ready for review March 24, 2024 15:38
@airMeng
Copy link
Collaborator

airMeng commented Mar 25, 2024

I think previously we ought to unify all functions into one file #3965 (comment), now the design has changed? Could you share any more details?

@slaren
Copy link
Collaborator Author

slaren commented Mar 25, 2024

There was a brief discussion about this in #5434. The compilation time of the CUDA backend was increasing to the point that it was complicating working on it. Personally, I also prefer to work on multiple smaller files than in one large file, but I don't know if everybody agrees about that.

@airMeng
Copy link
Collaborator

airMeng commented Mar 25, 2024

There was a brief discussion about this in #5434. The compilation time of the CUDA backend was increasing to the point that it was complicating working on it. Personally, I also prefer to work on multiple smaller files than in one large file, but I don't know if everybody agrees about that.

Will you summary it in a document, for example, "guidelines for adding new backends"? I see some backends are WIP like #6035 and following new designs might be easier.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how the files are organized. We should also take the opportunity to deprecate the LLAMA_CUBLAS option with the more meaningful LLAMA_CUDA. But not super important, and probably better for a follow-up PR

@airMeng Starting with single-file implementations remains the preferred option. We can probably add a dummy backend implementation to serve as a starting point for new backend development - might be easier to keep it up-to-date compared to a document

@slaren slaren merged commit ae1f211 into master Mar 25, 2024
58 checks passed
@slaren slaren deleted the sl/cuda-refactor-files branch March 25, 2024 12:50
@anhnami
Copy link

anhnami commented Mar 25, 2024

I apologize for nagging but LLAMA_CUDA_F16 is broken again.

ggml-cuda/dmmv.cu(768): error: identifier "to_fp16_cuda_t" is undefined
          const to_fp16_cuda_t to_fp16_cuda = ggml_get_to_fp16_cuda(src1->type);
                ^

ggml-cuda/dmmv.cu(768): error: identifier "ggml_get_to_fp16_cuda" is undefined
          const to_fp16_cuda_t to_fp16_cuda = ggml_get_to_fp16_cuda(src1->type);
                                              ^

2 errors detected in the compilation of "ggml-cuda/dmmv.cu".

@slaren
Copy link
Collaborator Author

slaren commented Mar 25, 2024

Should be fixed in #6298

@ikawrakow
Copy link
Contributor

Considering the extremely long ggml-cuda.cu compilation times I was pleasantly surprised by this refactoring. Until I had to adapt PR #6302 to the refactoring: touching vecdotq.cuh, which one needs to do all the time when developing new quants or optimizing dot product kernels, leads to the very same extremely long compilation as before (this time of mmq.cu).

@slaren
Copy link
Collaborator Author

slaren commented Mar 25, 2024

The build time is dominated by mmvq.cu, and to a lesser degree by mmq.cu, so as you say, this will not help much if you are changing these files. It should be possible to split each quant type to a different file as well to improve this.

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority Very important issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants