Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epic: Cortex Model Repo supports Default Model download #1418

Closed
dan-menlo opened this issue Oct 4, 2024 · 7 comments · Fixed by #1430
Closed

epic: Cortex Model Repo supports Default Model download #1418

dan-menlo opened this issue Oct 4, 2024 · 7 comments · Fixed by #1430
Assignees

Comments

@dan-menlo
Copy link
Contributor

dan-menlo commented Oct 4, 2024

Goal

  • Cortex's Built-in Libraries have a "Cortex Model repo" format
  • Cortex Model Repos are a critical data structure to support cortex pull <model> and cortex run <model>

High-level Structure

  • Cortex model repos are Git repos
  • Git repo has branches, which hold different "versions" of the model (i.e. quantization, engine etc)
    • cortex run <model>:<branch> pulls a specific version
    • cortex run <model> pulls a default version
  • Git repo's "main branch" holds README and metadata, to allow user to navigate other branches

Decisions

Decision 1: Default Model Download

We need to figure out which model version cortex pull <model> and cortex run model` will pull.

Option 1: main branch

  • We have so far gone with main branch to hold our recommended version
  • We basically merge recommended version, e.g. 3b-gguf into main branch

However, I do not think this is correct long term:

  • Longer-term, I think default model selection will be algorithmic
    • e.g. Hardware detection -> pull most "optimized" model
  • main branch is non-descriptive as a branch name
    • e.g. main could hold 3b-gguf, but user is unaware
    • It is better to be upfront that we default to 3b-gguf
  • I think main branch requires more work to manage longer-term
    • Process of switching main branch will take some time - i.e. merge, git issues etc

It is also incorrect to compare our approach to Ollama. Ollama uses a tag-based system similar to Docker, where latest is a pointer to 3b. It is difficult to replicate this in a "straightforward UX" in Git (i.e. tags are not very visible from main page)

Option 2: main branch has metadata.yaml

  • I would like to instead propose a metadata.yaml approach to defining Default Model downloads
  • This can be superseded in the future with an algorithmic approach to selecting Default Model
  • metadata.yaml is also used to generate the CLI UX for cortex pull or cortex run

How it works

  • Cortex Model Repo's main branch will hold a few files (see below)
  • The v1 of metadata.yaml will be very simple
# File system
metadata.yaml
README.md
# metadata.yaml
version: 1
name: mistral
default: 3b-gguf

In the future, metadata.yaml can be more complicated, and allow for fine-grained control of CLI UX, e.g. sections for 3b, 7b, or by engine.

Furthermore, we can use metadata.yaml as a data structure to hold information about the different Model versions.

  • Can expand to include MMLU scores
  • Can expand to include file sizes
  • This can all be automated in the future through CI/CD on the Git repo
@dan-menlo dan-menlo added this to Menlo Oct 4, 2024
@dan-menlo dan-menlo converted this from a draft issue Oct 4, 2024
@dan-menlo dan-menlo moved this from Investigating to Scheduled in Menlo Oct 4, 2024
@dan-menlo dan-menlo changed the title epic: cortex run vs cortex pull UX epic: Cortex Model Repo format Oct 4, 2024
@dan-menlo dan-menlo changed the title epic: Cortex Model Repo format epic: Cortex Model Repo supports Default Model download Oct 4, 2024
@namchuai
Copy link
Collaborator

namchuai commented Oct 4, 2024

I agree with Option 2. Current approach can't hold default model for other engines. Also, it caused duplication trouble between main branch and a specific branch (3b, for example).

I think we can go with option 2 before releasing because it's more future-proof.

@gabrielle-ong
Copy link
Contributor

gabrielle-ong commented Oct 4, 2024

Ill update the 35 cortexso repos with metadata.yml on main branch.

@nguyenhoangthuan99 Can I get help on the recommended branches these 35 models? (listed from https://huggingface.co/cortexso)

Local models

cortexso/llama3.2 (3b-gguf-q4-km)
cortexso/mistral-nemo (12b-gguf-q4-ks)
cortexso/llama3 (8b-gguf-q4-ks)
cortexso/llama3.1 (8b-gguf-q4-ks)
cortexso/nomic-embed-text-v1 (main) - rarely used - 3 downloads last month

Future updates to default (when CI run to update branches)

cortexso/tinyllama (1b-gguf) (future 1b-gguf-q4-ks)
cortexso/mistral (7b-gguf) (future 7b-gguf-q4-ks)
cortexso/phi3 (mini-gguf) (future mini-gguf-q4-ks)
cortexso/gemma2 (2b-gguf) (future 2b-gguf-q4-ks)
cortexso/openhermes-2.5 (7b-gguf) (future 7b-gguf-q4-ks)
cortexso/mixtral (7x8b-gguf) (future 7x8b-gguf-q4-ks)
cortexso/yi-1.5 (34b-gguf) (future 34b-gguf-q4-ks)
cortexso/aya (12.9b-gguf) (future 12.9b-gguf-q4-ks)
cortexso/codestral (22b-gguf) (fututre 22b-gguf-q4-ks)
cortexso/command-r (35b-gguf) (future 35b-gguf-q4-ks)
cortexso/gemma (7b-gguf) (future 7b-gguf-q4-ks)
cortexso/qwen2 (7b-gguf) (future 7b-gguf-q4-ks)

Remote models (no need for metadata.yml as no gguf file)

cortexso/NVIDIA-NIM (remote model, we don't have gguf file for this model)
cortexso/gpt-4o-mini (remote model, we don't have gguf file for this model)
cortexso/open-router-auto (remote model, we don't have gguf file for this model)
cortexso/groq-mixtral-8x7b-32768 (remote model, we don't have gguf file for this model)
cortexso/groq-gemma-7b-it (remote model, we don't have gguf file for this model)
cortexso/groq-llama3-8b-8192 (remote model, we don't have gguf file for this model)
cortexso/groq-llama3-70b-8192 (remote model, we don't have gguf file for this model)
cortexso/claude-3-5-sonnet-20240620 (remote model, we don't have gguf file for this model)
cortexso/gpt-3.5-turbo (remote model, we don't have gguf file for this model)
cortexso/gpt-4o (remote model, we don't have gguf file for this model)
cortexso/martian-model-router (remote model, we don't have gguf file for this model)
cortexso/cohere-command-r-plus (remote model, we don't have gguf file for this model)
cortexso/cohere-command-r (remote model, we don't have gguf file for this model)
cortexso/mistral-large-latest (remote model, we don't have gguf file for this model)
cortexso/mistral-small-latest (remote model, we don't have gguf file for this model)
cortexso/claude-3-haiku-20240307 (remote model, we don't have gguf file for this model)
cortexso/claude-3-sonnet-20240229 (remote model, we don't have gguf file for this model)
cortexso/claude-3-opus-20240229 (remote model, we don't have gguf file for this model)

@gabrielle-ong
Copy link
Contributor

QN: is it metadata.yaml or metadata.yml?
We are currently using model.yml so I think it should be .yml to be consistent

@namchuai
Copy link
Collaborator

namchuai commented Oct 4, 2024

@gabrielle-ong, I agree with metadata.yml

If possible, please start with cortexso/llama3.2. Thank you!

@gabrielle-ong
Copy link
Contributor

Thanks @namchuai and @nguyenhoangthuan99! added to llama3.2 and working down the list

@gabrielle-ong
Copy link
Contributor

Created all the metadata.yml files in the list
Categorized the list above - Noted Alex on future changes to the default branch for some models -
let me know when the CI is run to add the new branches for those models

@github-project-automation github-project-automation bot moved this from Scheduled to Review + QA in Menlo Oct 4, 2024
@gabrielle-ong
Copy link
Contributor

Thanks @james and @nguyenhoangthuan99!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants