-
Notifications
You must be signed in to change notification settings - Fork 138
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Cortex.cpp Engine Dependencies Architecture (e.g. CUDA Toolkit) #1046
Comments
On my perspective, we should download CUDA toolkit separately. We support multiple engines: cortex.llamacpp and cortex.tensorrt-llm, both need CUDA toolkit to run. CUDA is backward compatible so we only need the latest CUDA toolkit version that supported by nvidia-driver version.
Edit: I just checked the cuda matrix compatibility and it is incorrect that CUDA is always backward compatible Related ticket: https://github.com/janhq/cortex/issues/1047 Edit 2: The above image is forward compatibility between cuda and nvidia-version
So yes, CUDA is backward compatible within a CUDA major release |
I'm referring this table to check for the compatibility between driver and toolkit |
Can I verify my understanding of the issue: Decision
My initial thoughts
This will be disk-space inefficient. However, the alternative seems to be dependency hell, which I think is even worse. Folder Structure
That said, am open to all ideas, especially @vansangpfiev's |
If disk-space inefficient is acceptable, I think we can go with option 1.
|
Thanks @vansangpfiev and @dan-homebrew I'm confirming that we agree with:
Question 2: Storing CUDA dependencies under corresponding engines.
Caveats:
Additional thought |
/.cortex
/deps
/cuda
cuda-11.5 or whatever versioning
/engines
/cortex.llamacpp
/bin
/cortex.tensorrt-llm
/bin
|
@0xSage , here's my thought. Please correct me if I'm wrong @nguyenhoangthuan99 @vansangpfiev
|
For 3, I think we can do the maintenance and updates by versioning: generate a file (for example version.txt) for each release, which has metadata for engine version and cuda version. We will update cuda dependencies if needed. |
@vansangpfiev @namchuai @0xSage Quick responses: Per-Engine Dependencies
I also agree with @vansangpfiev: let's co-locate all CUDA dependencies with the engine folder. Simple > Complex, especially since model files are >4gb. Updating Engines
I also think we need to think through the CLI and API commands:
NamingI wonder whether it is better for us to have clearer naming for Cortex engines:
This articulates the concept of Cortex engines more clearly. Hopefully, with a clear API, the community can also step in to help build backends. We would need to reason through
|
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Motivation
Do we package the cuda toolkit to the engine?
Yes? Then will have to do the same for
llamacpp
,tensorrt-llm
andonnx
?No? Will download separatedly
Folder structures (e.g if user have llamacpp, tensorrt at the same time)?
Resources
Llamacpp release
Currently we are downloading toolkit dependency via
https://catalog.jan.ai/dist/cuda-dependencies/<version>/<platform>/cuda.tar.gz
cc @vansangpfiev @nguyenhoangthuan99 @dan-homebrew
Update sub-tasks:
The text was updated successfully, but these errors were encountered: