-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planning: Cortex Model Compatibility API #1108
Comments
Note: This should be driven by Cortex team, with Jan UI as one of the task items. I think this is part of a larger "Hardware Detection, Config and Recommendations"
|
This is also being discussed in janhq/jan#1089 - let's link both issues. We will need to scope this to something less ambigious
|
Shifting to Sprint 21 to allow team to focus on Model Folder execution in Sprint 20 |
To calculate the total number of memory buffer require for a model, firstly let break it into many parts:
Model weight
KV cache The kv cache is calculated by follow:
quant_bit for kv_cache has 3 mode (f16 = 16bits, q8_0 = 8 bits, q4_0 = 4.5 bits) Buffer for preprocessing prompt The buffer for preprocess prompts related to
When we are not load all
the default We also need to reserve extra 100 MiB -200 MiB of Ram for some small buffers during processing. |
API documentationGET /v1/models Response {
"data" : [
{
"model": "model_1",
...
"recommendation": {
"cpu_mode": {
"ram": number
},
"gpu_mode": [{
"ram": number,
"vram": number,
"ngl": number,
"context_length": number,
"recommend_ngl": number
}]
}
}
]
} |
CLI Documentation:Get model list information
If no flag is specified, display only model id |
Goal
model.yaml
GET /models
andGET /model/<model_id>
)Related Issues
settings.json
#1140Original Post
Specs
https://www.notion.so/jan-ai/Hardware-Detection-and-Recommendations-b04bc3109c2846d58572415125e0a9a5?pvs=4
Key user stories
Design
https://www.figma.com/design/DYfpMhf8qiSReKvYooBgDV/Jan-App-(3rd-version)?node-id=5115-60038&t=OgzCw09qXKxZj3DC-4
The text was updated successfully, but these errors were encountered: