Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planning: Cortex handling Engine Variants #1453

Closed
2 tasks
Tracked by #1416
dan-menlo opened this issue Oct 13, 2024 · 1 comment
Closed
2 tasks
Tracked by #1416

planning: Cortex handling Engine Variants #1453

dan-menlo opened this issue Oct 13, 2024 · 1 comment
Assignees

Comments

@dan-menlo
Copy link
Contributor

dan-menlo commented Oct 13, 2024

Goal

Tasklist

  • API Design
  • CLI Design

Questions

Scenario

  • There are some users who have both Nvidia and AMD GPUs in their computer
    • Jan already supports Vulkan
    • Under the hood, this requires us to switch from llama-cuda-avx2 to llama-vulkan
    • llama.cpp alone has 18 variants at the moment

Cortex needs an elegant way to handle different engine versions + variants, without confusing the user. From my naive perspective, there are two key approaches

Option 1: Every engine is versioned, and maintains a list of variants that it can use

  • Engines are versioned, and each version has several variants that can be chosen from
    • CLI: we would support a nvm-like use command
    • API: /engines API endpoint would have a use endpoint
> cortex engines get llama.cpp 
{ 
    version: b3919
    ...
}

> cortex engines llama.cpp variants list
llama-b3912-bin-win-hip-x64-gfx1030
llama-b3912-bin-win-cuda-cu11.7.1-x64

> cortex engines llama.cpp use llama-b3912-bin-win-cuda-cu11.7.1

Option 2: Every engine version/variant is a first-class Engine citizen

  • We treat every single engine version/variant as a first-class engine citizen (e.g. llama-b3919-avx-cuda)
    • Users will basically run models using a specific engine variant/version
    • cortex engines list will show a massive long list of engines
  • I don't think this is doable, tbh
> cortex engines list

llama.cpp-b3919-cuda
llama.cpp-b3821-vulkan
@dan-menlo dan-menlo added this to Menlo Oct 13, 2024
@dan-menlo dan-menlo converted this from a draft issue Oct 13, 2024
@dan-menlo dan-menlo changed the title epic: Cortex handling Engine versions and variants? epic: Cortex handling Engine variants Oct 13, 2024
@dan-menlo dan-menlo changed the title epic: Cortex handling Engine variants epic: Cortex handling Engine Variants Oct 13, 2024
@dan-menlo dan-menlo moved this to Investigating in Menlo Oct 13, 2024
@dan-menlo dan-menlo added this to the v1.0.2 milestone Oct 14, 2024
@freelerobot freelerobot moved this from Investigating to Planning in Menlo Oct 15, 2024
@dan-menlo dan-menlo changed the title epic: Cortex handling Engine Variants planning: Cortex handling Engine Variants Oct 19, 2024
@dan-menlo
Copy link
Contributor Author

Closing into #1416

@github-project-automation github-project-automation bot moved this from Planning to Review + QA in Menlo Oct 24, 2024
@gabrielle-ong gabrielle-ong moved this from Review + QA to Completed in Menlo Oct 25, 2024
@gabrielle-ong gabrielle-ong removed this from the v1.0.2 milestone Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

5 participants