Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: run llama3:tensorrt-llm leads to "cortex.llamacpp engine not found" #1020

Closed
Tracked by #1152
freelerobot opened this issue Aug 15, 2024 · 3 comments
Closed
Tracked by #1152
Assignees
Labels
engine: tensorrt-llm wontfix This will not be worked on

Comments

@freelerobot
Copy link
Contributor

freelerobot commented Aug 15, 2024

Describe the bug

  1. install cortex
  2. start server
  3. cortex run llama3:tensorrt-llm --chat
  4. NOTE: tensorrt-LLM branch doesn't exist in llama3 hf repo
  5. model successfully downloads, but binary is empty,but there is a model.yaml
  6. But when running, get issue:
(base) PS C:\Windows\System32> cortex run llama3:tensorrt-llm --chat
√ Dependencies loaded in 862ms
√ API server is online
√ Model found
Downloading engine...
 ████████████████████████████████████████ 100% | ETA: 0s | 100/100
× 500 status code (no body)
Last errors:
× Model loading failed
{"method":"POST","path":"/v1/models/llama3:tensorrt-llm/start","statusCode":500,"ip":"127.0.0.1","content_length":"52","user_agent":"CortexClient/JS 0.1.7","x_correlation_id":""} HTTP
- Loading model...
20240815 15:29:47.151000 UTC 10740 INFO  CPU instruction set: fpu = 1| mmx = 1| sse = 1| sse2 = 1| sse3 = 1| ssse3 = 1| sse4_1 = 1| sse4_2 = 1| pclmulqdq = 1| avx = 1| avx2 = 1| avx512_f = 1| avx512_dq = 1| avx512_ifma = 1| avx512_pf = 0| avx512_er = 0| avx512_cd = 1| avx512_bw = 1| has_avx512_vl = 1| has_avx512_vbmi = 1| has_avx512_vbmi2 = 1| avx512_vnni = 1| avx512_bitalg = 1| avx512_vpopcntdq = 1| avx512_4vnniw = 0| avx512_4fmaps = 0| avx512_vp2intersect = 0| aes = 1| f16c = 1| - server.cc:288
20240815 15:29:47.151000 UTC 10740 ERROR Could not load engine: Could not load library "C:\Users\n\cortex/engines/cortex.llamacpp/engine.dll"
The specified module could not be found.

 - server.cc:299
× Model loading failed
{"method":"POST","path":"/v1/models/llama3:tensorrt-llm/start","statusCode":500,"ip":"127.0.0.1","content_length":"52","user_agent":"CortexClient/JS 0.1.7","x_correlation_id":""} HTTP
...

Turns out. It somehow downloaded an empty model instead of just failing.

ah i see the issue. tensorrt-llm is an invalid tag (so cortex.so/models is terribly wrong)
and cortex run llama3:tensorrt-llm downloaded a default empty model
there's no hf repo branch called tensorrt-llm.

(base) PS C:\Users\n\cortex\models> cat .\llama3-tensorrt-llm.yaml
files:
  - C:\Users\n\cortex\models\llama3-tensorrt-llm\.gitattributes
model: llama3:tensorrt-llm
name: llama3:tensorrt-llm
stop: []
stream: true
max_tokens: 4096
frequency_penalty: 0.7
presence_penalty: 0.7
temperature: 0.7
top_p: 0.7
ctx_len: 4096
ngl: 100
engine: cortex.llamacpp
id: llama3:tensorrt-llm
created: 1723735451386
object: model
owned_by: ''

Specs:

  • windows, RTX4070 , latest cuda/nvidia
  • cortex v0.5.0 - 44
@freelerobot freelerobot added the type: bug Something isn't working label Aug 15, 2024
@freelerobot freelerobot changed the title bug: Running a trtllm model fails on "cortex.llamacpp engine not found" bug: run llama3:tensorrt-llm leads to "cortex.llamacpp engine not found" Aug 15, 2024
@louis-menlo
Copy link
Contributor

There is an engines init issue where it looks for an incorrect binary, but the one above is not completely fixed. We need to check why it links to .gitattributes and generates an invalid YAML file.

@imtuyethan imtuyethan moved this to Planning in Menlo Sep 2, 2024
@imtuyethan imtuyethan moved this from Planning to Scheduled in Menlo Sep 2, 2024
@dan-menlo
Copy link
Contributor

@vansangpfiev I am reassigning this to the Cortex team - if this issue does not exist for the C++ implementation, you can proceed to close this ticket

@gabrielle-ong
Copy link
Contributor

Deprecated due to TensorRT-LLM not supporting Desktop
Parent issue: #1742

@github-project-automation github-project-automation bot moved this from Scheduled to Review + QA in Menlo Nov 28, 2024
@gabrielle-ong gabrielle-ong moved this from Review + QA to Completed in Menlo Nov 28, 2024
@gabrielle-ong gabrielle-ong added wontfix This will not be worked on engine: tensorrt-llm and removed type: bug Something isn't working engine: tensorrt-llm category: engine management Related to engine abstraction labels Nov 28, 2024
@gabrielle-ong gabrielle-ong moved this from Completed to Discontinued in Menlo Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engine: tensorrt-llm wontfix This will not be worked on
Projects
Archived in project
Development

No branches or pull requests

5 participants