feat: `cortex run model(:gguf)` #1076

vansangpfiev · 2024-08-21T00:42:14Z

Goals

Cortex run should load and run model
Cortex should run model with relevant parameters
Cortex should display informative error messages if it fails

Tasklist

cortex run model:llamacpp
Ensure dylib issue is fixed bug: libengine.dylib not found #953
Decision: should we uncouple engine install? (e.g. error message, directing user to install engine with separate command)
Informative Error messages for major failure modes (e.g. OOM)
Tests
Docs

The text was updated successfully, but these errors were encountered:

vansangpfiev · 2024-08-29T01:59:59Z

run is a chain command, it will do:

models pull if model is not existed
engines install if engine is not existed
models start
chat

freelerobot · 2024-09-23T12:39:19Z

see: #1303

gabrielle-ong · 2024-09-26T10:07:39Z

Can I check

I've downloaded tinyllama:gguf
cortex run tinyllama asks me to choose the variant -> chose gguf
reinitiates download of tinyllama:gguf

dan-menlo · 2024-09-26T10:11:27Z

@vansangpfiev @0xSage Can I ask about our understanding of cortex run - I thought we decided on run being a non-interactive command, to optimize for DevOps engineers.

I thought the previous Cortex JS implementation had the following logic:

# Run command
cortex run tinyllama (runs in background, advises user to use `cortex chat` for interactive)
cortex run tinyllama --chat (for interactive chat)
cortex chat tinyllama (shorthand for run and --chat, for interactive shell)

# Chat completion
cortex chat completion tinyllama -m <message>

The idea was that if you are running Cortex in a docker container, you would not want to have an interactive chat shell. This would be similar to docker run, i.e. runs in the background.

I'm open to either path - however, I think we need to think from a UX point of view on this.

vansangpfiev · 2024-09-26T11:08:36Z

We can

Can I check

I've downloaded tinyllama:gguf

cortex run tinyllama asks me to choose the variant -> chose gguf

reinitiates download of tinyllama:gguf

I think the model id is tinyllama:gguf, so when you run cortex run tinyllama, we don't have that tinyllama as a model id yet.
You need to run cortex run tinyllama:gguf instead

vansangpfiev · 2024-09-26T11:17:57Z

@vansangpfiev @0xSage Can I ask about our understanding of cortex run - I thought we decided on run being a non-interactive command, to optimize for DevOps engineers.

I thought the previous Cortex JS implementation had the following logic:
# Run command
cortex run tinyllama (runs in background, advises user to use `cortex chat` for interactive)
cortex run tinyllama --chat (for interactive chat)
cortex chat tinyllama (shorthand for run and --chat, for interactive shell)

# Chat completion
cortex chat completion tinyllama -m <message>
The idea was that if you are running Cortex in a docker container, you would not want to have an interactive chat shell. This would be similar to docker run, i.e. runs in the background.

I'm open to either path - however, I think we need to think from a UX point of view on this.

@dan-homebrew I'm good with the old CortexJs design. Let me change the implementation.
However, I would like to confirm:

cortex chat will also be a command chain? Which includes cortex start cortex models start?
we don't have cortex chat completion command

Actually, 2 commands to do the same task does not make sense to me

cortex run tinyllama --chat (for interactive chat)
cortex chat tinyllama (shorthand for run and --chat, for interactive shell)

dan-menlo · 2024-09-27T05:46:27Z

@vansangpfiev This is how I see it:

`cortex run`

This is similar to docker run, i.e. runs it in background
It should have a success/error message
It should advise the user to use chat for interactive shell

# Run command
> cortex run tinyllama 

tinyllama model started successfully. Use `cortex chat tinyllama` for interactive chat shell.

`cortex chat`

This is a syntactic sugar on top of cortex run --chat
You should be able to re-use the logic from cortex run (i.e. don't reimplement)
Thus, it is also a command chan that includes cortex start and cortex models start

# Same things
cortex chat tinyllama (shorthand for `cortex run tinyllama --chat`)
cortex run tinyllama --chat` (re-uses logic for cortex run)

`cortex chat completion`

This is in reference to @0xSage's previous post, where Cortex could do a single inference command with chat -m
Her rationale was that DevOps and QA may use it to do single test outputs (imho this is not common)
Since we have already implemented it, I recommend this be a cortex chat completion command, in line with the API call

cortex chat completion tinyllama -m <message>

cc @gabrielle-ong: Please note, you will be updating Docs for this
cc @0xSage: FYI

vansangpfiev · 2024-09-27T05:51:33Z

Got it. Let me change the implementation.

vansangpfiev · 2024-09-27T06:27:11Z

@dan-homebrew Since new command cortex chat completion is unclear, we don't know user want to send a message to a model or want to chat with completion model, and syntax error handling in this case is quite complicated. I think we can change that command to cortex chat-completion. What do you think?

freelerobot · 2024-09-27T08:27:21Z

@dan-homebrew Since new command cortex chat completion is unclear, we don't know user want to send a message to a model or want to chat with completion model, and syntax error handling in this case is quite complicated. I think we can change that command to cortex chat-completion. What do you think?

Yes, sorry to block on this, chat-completion makes a lot of sense 🙏

dan-menlo · 2024-09-27T08:29:11Z

@dan-homebrew Since new command cortex chat completion is unclear, we don't know user want to send a message to a model or want to chat with completion model, and syntax error handling in this case is quite complicated. I think we can change that command to cortex chat-completion. What do you think?

@vansangpfiev @0xSage I think the correct way to do this, is to overload the cortex chat command

# Chat (interactive)
cortex chat tinyllama
cortex run tinyllama --chat (underlying command)

# Chat (single message)
cortex chat tinyllama -m <message>

I am not very much in favor of creating an additional CLI method for chat-completions, would prefer to just simplify.

dan-menlo · 2024-09-27T08:33:42Z

I think the model id is tinyllama:gguf, so when you run cortex run tinyllama, we don't have that tinyllama as a model id yet.

You need to run cortex run tinyllama:gguf instead

@vansangpfiev This is probably not the correct UX. We should have a "main" branch equivalent,

cortex run tinyllama
cortex run llama3.2 (pulls :main)

Let me discuss with @gabrielle-ong. We will likely have a main branch on the Cortex Model repo to support this use case. We will discuss and have spec ready by Monday.

In the long term, this will likely be replaced by something that automatically detects users' hardware and picks the correct engine.

gabrielle-ong · 2024-09-30T10:24:14Z

QA:
✅ Mac
✅ Windows
✅ Ubuntu

Commands as listed on #1351:

cortex run tinyllama:gguf -> runs in background, advises user to use cortex chat for interactive
cortex run tinyllama:gguf --chat -> interactive chat
cortex chat tinyllama:gguf -> interactive chat
cortex chat tinyllama:gguf -m "write a poem"

Clarified: There is no cortex chat completion command.

To be followed by #1362 - dont require :gguf in model name
(which requires changes to the Hugging Face repo branches, out of scope of this implementation issue)

vansangpfiev mentioned this issue Aug 21, 2024

epic: Cortex.cpp CLI MVP v0.1 #1081

Closed

19 tasks

vansangpfiev self-assigned this Aug 22, 2024

vansangpfiev mentioned this issue Aug 29, 2024

feat: run command #1045

Merged

imtuyethan transferred this issue from another repository Sep 2, 2024

freelerobot added category: model running Inference ux, handling context/parameters, runtime category: model management Model pull, yaml, model state P1: important Important feature / fix labels Sep 6, 2024

freelerobot changed the title ~~cortex run~~ feat: cortex run Sep 6, 2024

dan-menlo added category: model management Model pull, yaml, model state and removed category: model management Model pull, yaml, model state labels Sep 6, 2024

dan-menlo changed the title ~~feat: cortex run~~ feat: cortex run for llama.cpp Sep 8, 2024

dan-menlo changed the title ~~feat: cortex run for llama.cpp~~ feat: cortex run model:gguf Sep 8, 2024

dan-menlo added the type: epic A major feature or initiative label Sep 8, 2024

dan-menlo mentioned this issue Sep 10, 2024

epic: Fix Local Engine issues (llama.cpp) janhq/jan#3614

Closed

10 tasks

freelerobot mentioned this issue Sep 26, 2024

bug: cortex pull model_id:variant syntax not recognized #1303

Closed

6 tasks

dan-menlo changed the title ~~feat: cortex run model:gguf~~ feat: cortex run model(:gguf) Sep 26, 2024

dan-menlo mentioned this issue Sep 27, 2024

roadmap: Built-in Model Library janhq/models#21

Open

10 tasks

vansangpfiev mentioned this issue Sep 27, 2024

fix: UI/UX improvement for cortex CLI #1351

Merged

3 tasks

vansangpfiev moved this from Review + QA to In Progress in Menlo Sep 30, 2024

vansangpfiev closed this as completed in #1351 Sep 30, 2024

github-project-automation bot moved this from In Progress to Review + QA in Menlo Sep 30, 2024

gabrielle-ong mentioned this issue Sep 30, 2024

feat: cortex run model(:gguf) part 2 #1362

Closed

gabrielle-ong added this to the v1.0.0 milestone Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: `cortex run model(:gguf)` #1076

feat: `cortex run model(:gguf)` #1076

vansangpfiev commented Aug 21, 2024 •

edited by dan-menlo

Loading

vansangpfiev commented Aug 29, 2024 •

edited

Loading

freelerobot commented Sep 23, 2024

gabrielle-ong commented Sep 26, 2024

dan-menlo commented Sep 26, 2024 •

edited

Loading

vansangpfiev commented Sep 26, 2024 •

edited

Loading

vansangpfiev commented Sep 26, 2024 •

edited

Loading

dan-menlo commented Sep 27, 2024 •

edited

Loading

vansangpfiev commented Sep 27, 2024

vansangpfiev commented Sep 27, 2024

freelerobot commented Sep 27, 2024

dan-menlo commented Sep 27, 2024 •

edited

Loading

dan-menlo commented Sep 27, 2024 •

edited

Loading

gabrielle-ong commented Sep 30, 2024

feat: cortex run model(:gguf) #1076

feat: cortex run model(:gguf) #1076

Comments

vansangpfiev commented Aug 21, 2024 • edited by dan-menlo Loading

Goals

Tasklist

vansangpfiev commented Aug 29, 2024 • edited Loading

freelerobot commented Sep 23, 2024

gabrielle-ong commented Sep 26, 2024

dan-menlo commented Sep 26, 2024 • edited Loading

vansangpfiev commented Sep 26, 2024 • edited Loading

vansangpfiev commented Sep 26, 2024 • edited Loading

dan-menlo commented Sep 27, 2024 • edited Loading

cortex run

cortex chat

cortex chat completion

vansangpfiev commented Sep 27, 2024

vansangpfiev commented Sep 27, 2024

freelerobot commented Sep 27, 2024

dan-menlo commented Sep 27, 2024 • edited Loading

dan-menlo commented Sep 27, 2024 • edited Loading

gabrielle-ong commented Sep 30, 2024

feat: `cortex run model(:gguf)` #1076

feat: `cortex run model(:gguf)` #1076

vansangpfiev commented Aug 21, 2024 •

edited by dan-menlo

Loading

vansangpfiev commented Aug 29, 2024 •

edited

Loading

dan-menlo commented Sep 26, 2024 •

edited

Loading

vansangpfiev commented Sep 26, 2024 •

edited

Loading

vansangpfiev commented Sep 26, 2024 •

edited

Loading

dan-menlo commented Sep 27, 2024 •

edited

Loading

`cortex run`

`cortex chat`

`cortex chat completion`

dan-menlo commented Sep 27, 2024 •

edited

Loading

dan-menlo commented Sep 27, 2024 •

edited

Loading