We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorRT-LLM version:v0.13.0 GPU:A100
Convert script:
python3 convert_checkpoint.py \ --model_dir /model/Qwen2.5-14B-Instruct \ --output_dir /model/trt_engines/Qwen2.5-14B-Instruct \ --dtype float16 \ --smoothquant 0.5 \ --per_channel \ --per_token \ --tp_size 2 \ --pp_size 1 \ --calib_dataset /app/datasets/cnn_dailymail/train
ERROR:
I changed line 300 and line 301 in tensorrt_llm/models/qwen/convert.py
k_split = torch.split(k, k.shape[-1] // tp_size, dim=-1) v_split = torch.split(v, v.shape[-1] // tp_size, dim=-1)
I can convert the checkpoint successfully, but I get a new error when I try to build engine using trtllm-build.
Who can help me?
The text was updated successfully, but these errors were encountered:
Hi @liu21yd, thank you for your feedback. This is indeed a bug. We will fix this in the next release. Before that, you can try this hot fix: #2370.
Sorry, something went wrong.
Can you normally accelerate the inference of the Qwen2.5-14B-Instruct model in TensorRT-LLM?
in my case, the tokens per second tripled from 16tps to 65 tps (compared to llama.cpp)
No branches or pull requests
TensorRT-LLM version:v0.13.0
GPU:A100
Convert script:
ERROR:
I changed line 300 and line 301 in tensorrt_llm/models/qwen/convert.py
I can convert the checkpoint successfully, but I get a new error when I try to build engine using trtllm-build.
Who can help me?
The text was updated successfully, but these errors were encountered: