bug: run llama3:tensorrt-llm leads to "cortex.llamacpp engine not found" #1020

freelerobot · 2024-08-15T15:34:04Z

Describe the bug

install cortex
start server
cortex run llama3:tensorrt-llm --chat
NOTE: tensorrt-LLM branch doesn't exist in llama3 hf repo
model successfully downloads, but binary is empty,but there is a model.yaml
But when running, get issue:

(base) PS C:\Windows\System32> cortex run llama3:tensorrt-llm --chat
√ Dependencies loaded in 862ms
√ API server is online
√ Model found
Downloading engine...
 ████████████████████████████████████████ 100% | ETA: 0s | 100/100
× 500 status code (no body)
Last errors:
× Model loading failed
{"method":"POST","path":"/v1/models/llama3:tensorrt-llm/start","statusCode":500,"ip":"127.0.0.1","content_length":"52","user_agent":"CortexClient/JS 0.1.7","x_correlation_id":""} HTTP
- Loading model...
20240815 15:29:47.151000 UTC 10740 INFO  CPU instruction set: fpu = 1| mmx = 1| sse = 1| sse2 = 1| sse3 = 1| ssse3 = 1| sse4_1 = 1| sse4_2 = 1| pclmulqdq = 1| avx = 1| avx2 = 1| avx512_f = 1| avx512_dq = 1| avx512_ifma = 1| avx512_pf = 0| avx512_er = 0| avx512_cd = 1| avx512_bw = 1| has_avx512_vl = 1| has_avx512_vbmi = 1| has_avx512_vbmi2 = 1| avx512_vnni = 1| avx512_bitalg = 1| avx512_vpopcntdq = 1| avx512_4vnniw = 0| avx512_4fmaps = 0| avx512_vp2intersect = 0| aes = 1| f16c = 1| - server.cc:288
20240815 15:29:47.151000 UTC 10740 ERROR Could not load engine: Could not load library "C:\Users\n\cortex/engines/cortex.llamacpp/engine.dll"
The specified module could not be found.

 - server.cc:299
× Model loading failed
{"method":"POST","path":"/v1/models/llama3:tensorrt-llm/start","statusCode":500,"ip":"127.0.0.1","content_length":"52","user_agent":"CortexClient/JS 0.1.7","x_correlation_id":""} HTTP
...

Turns out. It somehow downloaded an empty model instead of just failing.

ah i see the issue. tensorrt-llm is an invalid tag (so cortex.so/models is terribly wrong)
and cortex run llama3:tensorrt-llm downloaded a default empty model
there's no hf repo branch called tensorrt-llm.

(base) PS C:\Users\n\cortex\models> cat .\llama3-tensorrt-llm.yaml
files:
  - C:\Users\n\cortex\models\llama3-tensorrt-llm\.gitattributes
model: llama3:tensorrt-llm
name: llama3:tensorrt-llm
stop: []
stream: true
max_tokens: 4096
frequency_penalty: 0.7
presence_penalty: 0.7
temperature: 0.7
top_p: 0.7
ctx_len: 4096
ngl: 100
engine: cortex.llamacpp
id: llama3:tensorrt-llm
created: 1723735451386
object: model
owned_by: ''

Specs:

windows, RTX4070 , latest cuda/nvidia
cortex v0.5.0 - 44

louis-menlo · 2024-08-16T16:03:46Z

There is an engines init issue where it looks for an incorrect binary, but the one above is not completely fixed. We need to check why it links to .gitattributes and generates an invalid YAML file.

dan-menlo · 2024-09-03T06:30:58Z

@vansangpfiev I am reassigning this to the Cortex team - if this issue does not exist for the C++ implementation, you can proceed to close this ticket

gabrielle-ong · 2024-11-28T07:26:27Z

Deprecated due to TensorRT-LLM not supporting Desktop
Parent issue: #1742

freelerobot added the type: bug Something isn't working label Aug 15, 2024

github-project-automation bot added this to Menlo Aug 15, 2024

freelerobot changed the title ~~bug: Running a trtllm model fails on "cortex.llamacpp engine not found"~~ bug: run llama3:tensorrt-llm leads to "cortex.llamacpp engine not found" Aug 15, 2024

louis-menlo mentioned this issue Aug 16, 2024

fix: engine init is broken #1022

Merged

3 tasks

imtuyethan moved this to Planning in Menlo Sep 2, 2024

imtuyethan moved this from Planning to Scheduled in Menlo Sep 2, 2024

imtuyethan assigned louis-menlo and vansangpfiev and unassigned louis-menlo Sep 2, 2024

freelerobot added category: engine management Related to engine abstraction engine: onnx engine: tensorrt-llm and removed engine: onnx labels Sep 6, 2024

dan-menlo mentioned this issue Sep 8, 2024

epic: Cortex TensorRT-LLM support #1152

Closed

7 tasks

gabrielle-ong mentioned this issue Nov 28, 2024

[Closed] TensorRT-LLM Engine stories #1742

Closed

gabrielle-ong closed this as completed Nov 28, 2024

github-project-automation bot moved this from Scheduled to Review + QA in Menlo Nov 28, 2024

gabrielle-ong moved this from Review + QA to Completed in Menlo Nov 28, 2024

gabrielle-ong added wontfix This will not be worked on engine: tensorrt-llm and removed type: bug Something isn't working engine: tensorrt-llm category: engine management Related to engine abstraction labels Nov 28, 2024

gabrielle-ong moved this from Completed to Discontinued in Menlo Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: run llama3:tensorrt-llm leads to "cortex.llamacpp engine not found" #1020

bug: run llama3:tensorrt-llm leads to "cortex.llamacpp engine not found" #1020

freelerobot commented Aug 15, 2024 •

edited

Loading

louis-menlo commented Aug 16, 2024

dan-menlo commented Sep 3, 2024

gabrielle-ong commented Nov 28, 2024

bug: run llama3:tensorrt-llm leads to "cortex.llamacpp engine not found" #1020

bug: run llama3:tensorrt-llm leads to "cortex.llamacpp engine not found" #1020

Comments

freelerobot commented Aug 15, 2024 • edited Loading

louis-menlo commented Aug 16, 2024

dan-menlo commented Sep 3, 2024

gabrielle-ong commented Nov 28, 2024

freelerobot commented Aug 15, 2024 •

edited

Loading