Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Cortex.cpp Model Loading and Inference Errors #1091

Closed
freelerobot opened this issue Sep 4, 2024 · 4 comments
Closed

Discussion: Cortex.cpp Model Loading and Inference Errors #1091

freelerobot opened this issue Sep 4, 2024 · 4 comments
Assignees

Comments

@freelerobot
Copy link
Contributor

freelerobot commented Sep 4, 2024

At the moment we fail silently and users have to send us logs. "model failed to load"

Can we get a handle on all the potential reasons why their model failed to load, and discuss how to handle each issue?

Goal:

  1. Graceful failures
  2. Predefined errors
  3. Though there are endless errors, lets adopt the Pareto Rule, as 80% of our bugs are due to 20% common model loading challenges

Examples

  1. Model won't fit in RAM/VRAM
  2. Another model is running... other edge cases & race conditions
  3. Wrong model format (i.e. unsupported runtime)
  4. Version conflicts (in trt-llm engine scneario)
  5. Missing model.yaml, template, key input/configs
  6. Corrupted or missing model binaries
  7. Incompat hardware. See

Questions:

  1. What are the other common issues?
  2. We support various engines, but should we standardize failure modes? This allows us to offer better dx/ux down the road.
  3. What are the various ways that llamacpp, trtllm, directml currently handle errors? Do they have a predefined, neat list we can adopt?

Related issues:

@freelerobot freelerobot added this to Menlo Sep 4, 2024
@freelerobot freelerobot converted this from a draft issue Sep 4, 2024
@freelerobot freelerobot changed the title Discussion: Cortex.cpp Model Loading Common Errors Discussion: Cortex.cpp Model Failed to Load Graceful Failure Sep 4, 2024
@dan-menlo
Copy link
Contributor

@0xSage I recommend we expand this to

  • Error handling for Model Loading
  • Error handling for Model Running

@freelerobot freelerobot changed the title Discussion: Cortex.cpp Model Failed to Load Graceful Failure Discussion: Cortex.cpp Model Loading Graceful Failures Sep 4, 2024
@freelerobot freelerobot changed the title Discussion: Cortex.cpp Model Loading Graceful Failures Discussion: Cortex.cpp Model Orchestration Errors Sep 4, 2024
@freelerobot
Copy link
Contributor Author

freelerobot commented Sep 4, 2024

Example:

Model Loading

Error Code Error Message Failover (if any)
InsufficientMemory "The model is too big for your (V)RAM" -

Model Running

Error Code Error Message Failover (if any)
ContextExceeded "Your input exceeded the model context window of tokens" -

@dan-menlo dan-menlo changed the title Discussion: Cortex.cpp Model Orchestration Errors Discussion: Cortex.cpp Model Loading and Inference Errors Sep 5, 2024
@dan-menlo
Copy link
Contributor

@0xSage @vansangpfiev I am renaming this discussion to "Model Loading and Inference Errors"

@dan-menlo
Copy link
Contributor

This Bug report could be more informative, with better logs from cortex.cpp:
janhq/jan#3552

@janhq janhq locked and limited conversation to collaborators Sep 5, 2024
@dan-menlo dan-menlo converted this issue into discussion #1110 Sep 5, 2024
@github-project-automation github-project-automation bot moved this from Need Investigation to Completed in Menlo Sep 5, 2024
@dan-menlo dan-menlo moved this from Completed to Discontinued in Menlo Sep 6, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants