Error happened when quantizate Qwen2.5-14B-Instruct by SmoothQuant #2319

liu21yd · 2024-10-11T08:01:24Z

TensorRT-LLM version：v0.13.0
GPU：A100

Convert script：

python3 convert_checkpoint.py \
                        --model_dir /model/Qwen2.5-14B-Instruct \
                        --output_dir /model/trt_engines/Qwen2.5-14B-Instruct \
                        --dtype float16 \
                        --smoothquant 0.5 \
                        --per_channel \
                        --per_token \
                        --tp_size 2 \
                        --pp_size 1 \
                        --calib_dataset /app/datasets/cnn_dailymail/train

ERROR:

I changed line 300 and line 301 in tensorrt_llm/models/qwen/convert.py

        k_split = torch.split(k, k.shape[-1] // tp_size, dim=-1)
        v_split = torch.split(v, v.shape[-1] // tp_size, dim=-1)

I can convert the checkpoint successfully, but I get a new error when I try to build engine using trtllm-build.

Who can help me?

The text was updated successfully, but these errors were encountered:

jershi425 · 2024-11-05T13:16:10Z

Hi @liu21yd, thank you for your feedback. This is indeed a bug. We will fix this in the next release. Before that, you can try this hot fix: #2370.

a2382625920 · 2024-11-11T08:42:46Z

Can you normally accelerate the inference of the Qwen2.5-14B-Instruct model in TensorRT-LLM?

Wonder-donbury · 2024-12-02T02:08:27Z

Can you normally accelerate the inference of the Qwen2.5-14B-Instruct model in TensorRT-LLM?

in my case, the tokens per second tripled from 16tps to 65 tps (compared to llama.cpp)

Superjomn added bug Something isn't working triaged Issue has been triaged by maintainers labels Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error happened when quantizate Qwen2.5-14B-Instruct by SmoothQuant #2319

Error happened when quantizate Qwen2.5-14B-Instruct by SmoothQuant #2319

liu21yd commented Oct 11, 2024 •

edited

Loading

jershi425 commented Nov 5, 2024

a2382625920 commented Nov 11, 2024

Wonder-donbury commented Dec 2, 2024 •

edited

Loading

Error happened when quantizate Qwen2.5-14B-Instruct by SmoothQuant #2319

Error happened when quantizate Qwen2.5-14B-Instruct by SmoothQuant #2319

Comments

liu21yd commented Oct 11, 2024 • edited Loading

jershi425 commented Nov 5, 2024

a2382625920 commented Nov 11, 2024

Wonder-donbury commented Dec 2, 2024 • edited Loading

liu21yd commented Oct 11, 2024 •

edited

Loading

Wonder-donbury commented Dec 2, 2024 •

edited

Loading