planning: Cortex handling Engine Variants #1453

dan-menlo · 2024-10-13T07:34:56Z

Goal

Cortex has a clean Engines abstraction that allows us to handle llama.cpp versions and variants
- llama.cpp versions: e.g. b3912, b3909
- llama.cpp variants: e.g. win-avx-512, win-hipblas, win-llvm
Jan's "Vulkan" functionality will be replaced by this API (cc @louis-jan)
- We will need to support more variants, as llama.cpp's support for sycl, llvm grow
We support multiple hardware types
- epic: llama.cpp should support LLVM for ARM-based CPUs #1251
- epic: llama.cpp should support sycl for Intel-based CPUs #1252
Related: ci: CI packaging of llama.cpp dependencies into the binary file by default for Cortex's integration into Jan #1369

Tasklist

API Design
CLI Design

Questions

Scenario

There are some users who have both Nvidia and AMD GPUs in their computer
- Jan already supports Vulkan
- Under the hood, this requires us to switch from llama-cuda-avx2 to llama-vulkan
- llama.cpp alone has 18 variants at the moment

Cortex needs an elegant way to handle different engine versions + variants, without confusing the user. From my naive perspective, there are two key approaches

Option 1: Every engine is versioned, and maintains a list of variants that it can use

Engines are versioned, and each version has several variants that can be chosen from
- CLI: we would support a nvm-like use command
- API: /engines API endpoint would have a use endpoint

> cortex engines get llama.cpp 
{ 
    version: b3919
    ...
}

> cortex engines llama.cpp variants list
llama-b3912-bin-win-hip-x64-gfx1030
llama-b3912-bin-win-cuda-cu11.7.1-x64

> cortex engines llama.cpp use llama-b3912-bin-win-cuda-cu11.7.1

Option 2: Every engine version/variant is a first-class Engine citizen

We treat every single engine version/variant as a first-class engine citizen (e.g. llama-b3919-avx-cuda)
- Users will basically run models using a specific engine variant/version
- cortex engines list will show a massive long list of engines
I don't think this is doable, tbh

> cortex engines list

llama.cpp-b3919-cuda
llama.cpp-b3821-vulkan

The text was updated successfully, but these errors were encountered:

dan-menlo · 2024-10-24T02:43:14Z

Closing into #1416

dan-menlo added this to Menlo Oct 13, 2024

dan-menlo converted this from a draft issue Oct 13, 2024

dan-menlo assigned namchuai, nguyenhoangthuan99 and vansangpfiev Oct 13, 2024

dan-menlo changed the title ~~epic: Cortex handling Engine versions and variants?~~ epic: Cortex handling Engine variants Oct 13, 2024

dan-menlo changed the title ~~epic: Cortex handling Engine variants~~ epic: Cortex handling Engine Variants Oct 13, 2024

dan-menlo mentioned this issue Oct 13, 2024

planning: Cortex handles Engine Versions #1454

Closed

3 tasks

dan-menlo unassigned namchuai, vansangpfiev and nguyenhoangthuan99 Oct 13, 2024

This was referenced Oct 13, 2024

epic: Improve Cortex Engine Management #1416

Closed

epic: Cortex can support multiple versions of llama.cpp #1218

Closed

dan-menlo moved this to Investigating in Menlo Oct 13, 2024

This was referenced Oct 13, 2024

epic: llama.cpp should support LLVM for ARM-based CPUs #1251

Closed

epic: llama.cpp should support sycl for Intel-based CPUs #1252

Closed

freelerobot assigned vansangpfiev Oct 13, 2024

dan-menlo assigned nguyenhoangthuan99 and namchuai and unassigned vansangpfiev and nguyenhoangthuan99 Oct 13, 2024

dan-menlo added this to the v1.0.2 milestone Oct 14, 2024

freelerobot moved this from Investigating to Planning in Menlo Oct 15, 2024

dan-menlo changed the title ~~epic: Cortex handling Engine Variants~~ planning: Cortex handling Engine Variants Oct 19, 2024

dan-menlo mentioned this issue Oct 22, 2024

bug: Failed to Load Model with AMD GPU and Vulkan #1525

Open

6 tasks

dan-menlo closed this as completed Oct 24, 2024

github-project-automation bot moved this from Planning to Review + QA in Menlo Oct 24, 2024

gabrielle-ong moved this from Review + QA to Completed in Menlo Oct 25, 2024

gabrielle-ong removed this from the v1.0.2 milestone Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planning: Cortex handling Engine Variants #1453

planning: Cortex handling Engine Variants #1453

dan-menlo commented Oct 13, 2024 •

edited by gabrielle-ong

Loading

dan-menlo commented Oct 24, 2024

planning: Cortex handling Engine Variants #1453

planning: Cortex handling Engine Variants #1453

Comments

dan-menlo commented Oct 13, 2024 • edited by gabrielle-ong Loading

Goal

Tasklist

Questions

Scenario

Option 1: Every engine is versioned, and maintains a list of variants that it can use

Option 2: Every engine version/variant is a first-class Engine citizen

dan-menlo commented Oct 24, 2024

dan-menlo commented Oct 13, 2024 •

edited by gabrielle-ong

Loading