-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda : refactor into multiple files #6269
Conversation
If you have any suggestions to improve the organization of the files, the naming scheme or anything else, please let me know, I am sure it could be improved. Other than that, the only thing left to do is fix the HIP and |
I think previously we ought to unify all functions into one file #3965 (comment), now the design has changed? Could you share any more details? |
There was a brief discussion about this in #5434. The compilation time of the CUDA backend was increasing to the point that it was complicating working on it. Personally, I also prefer to work on multiple smaller files than in one large file, but I don't know if everybody agrees about that. |
Will you summary it in a document, for example, "guidelines for adding new backends"? I see some backends are WIP like #6035 and following new designs might be easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how the files are organized. We should also take the opportunity to deprecate the LLAMA_CUBLAS
option with the more meaningful LLAMA_CUDA
. But not super important, and probably better for a follow-up PR
@airMeng Starting with single-file implementations remains the preferred option. We can probably add a dummy backend implementation to serve as a starting point for new backend development - might be easier to keep it up-to-date compared to a document
I apologize for nagging but LLAMA_CUDA_F16 is broken again.
|
Should be fixed in #6298 |
Considering the extremely long |
The build time is dominated by |
The main goal is to make it easier to work on the CUDA backend and improve compilation times.