[Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat #1953

Yuchen-Cao · 2024-07-16T06:22:25Z

System Info

GPU NVIDIA L20

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I am trying to quantize CodeQwen1.5 7B Chat to FP8 using a modified version of the example quantization script:

python quantization/quantize.py --model_dir /mnt/models/CodeQwen1.5-7B-Chat \
                                --dtype float16 \
                                --qformat fp8 \
                                --kv_cache_dtype fp8 \
                                --output_dir /mnt/trt_models/codeqwen1.5_7b_checkpoint_1gpu_fp8_fp8kv \
                                --calib_size 512 \
                                --calib_dataset /mnt/dataset/cnn_dailymail

Expected behavior

The outside quantize.py will use quantize_and_export() to run quantization, and it is defined inside https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/quantization/quantize_by_modelopt.py

get_tokenizer will automatically read the tokenizer from my model_dir and set the pad_token as well as the eos_token.

actual behavior

But it failed to set the pad_token:

[07/16/2024-13:46:30] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error
[07/16/2024-13:46:30] [TRT-LLM] [I] Starting TensorRT-LLM init.
[TensorRT-LLM][INFO] Set logger level by INFO
[07/16/2024-13:46:30] [TRT-LLM] [I] TensorRT-LLM inited.
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024061100
Initializing model from /mnt/models/CodeQwen1.5-7B-Chat
[07/16/2024-13:47:14] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:28<00:00,  7.20s/it]
[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch.bfloat16.
Initializing tokenizer from /mnt/models/CodeQwen1.5-7B-Chat
Traceback (most recent call last):
  File "quantization/quantize.py", line 90, in <module>
    quantize_and_export(
  File "/opt/conda/lib/python3.8/site-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 289, in quantize_and_export
    tokenizer = get_tokenizer(model_dir,
  File "/opt/conda/lib/python3.8/site-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 147, in get_tokenizer
    assert tokenizer.pad_token is not None, f"Pad token for {model_type} cannot be set!"
AssertionError: Pad token for qwen cannot be set!

additional notes

I commented out some lines except for the AutoTokenizer.from_pretrained() to get this case worked.

def get_tokenizer(ckpt_path, max_seq_length=2048, model_type=None):
    print(f"Initializing tokenizer from {ckpt_path}")
    tokenizer = AutoTokenizer.from_pretrained(
        ckpt_path,
        model_max_length=max_seq_length,
        padding_side="left",
        trust_remote_code=True,
    )
    # if model_type and model_type == "qwen":
    #     # qwen use token id 151643 as pad and eos tokens
    #     tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643)
    #     tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643)

    # # can't set attribute 'pad_token' for "<unk>"
    # if tokenizer.pad_token != "<unk>":  # nosec B105
    #     tokenizer.pad_token = tokenizer.eos_token
    # if tokenizer.pad_token is None:
    #     tokenizer.pad_token = tokenizer.eos_token
    # assert tokenizer.pad_token is not None, f"Pad token for {model_type} cannot be set!"

    return tokenizer

I know that commenting out these lines will certainly affect other model's conversion. It seems there needs to be a fix on this function to support CodeQwen1.5.

The text was updated successfully, but these errors were encountered:

QiJune · 2024-07-16T06:41:30Z

@Tracin Could you please have a look? Thanks

Tracin · 2024-07-17T05:29:36Z

@Yuchen-Cao Thanks! We have fixed this.

Yuchen-Cao added the bug Something isn't working label Jul 16, 2024

Yuchen-Cao changed the title ~~[Quantization] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat~~ [Bug] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat Jul 16, 2024

Yuchen-Cao changed the title ~~[Bug] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat~~ [Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat Jul 16, 2024

QiJune assigned Tracin Jul 16, 2024

Tracin closed this as completed Jul 17, 2024

Tracin reopened this Jul 17, 2024

kaiyux mentioned this issue Jul 23, 2024

Update TensorRT-LLM #2008

Merged

QiJune added the functionality issue label Aug 4, 2024

QiJune closed this as completed Aug 4, 2024

Shixiaowei02 mentioned this issue Aug 29, 2024

TensorRT-LLM v0.12 Update #2164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat #1953

[Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat #1953

Yuchen-Cao commented Jul 16, 2024 •

edited

Loading

QiJune commented Jul 16, 2024

Tracin commented Jul 17, 2024

[Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat #1953

[Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat #1953

Comments

Yuchen-Cao commented Jul 16, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

QiJune commented Jul 16, 2024

Tracin commented Jul 17, 2024

Yuchen-Cao commented Jul 16, 2024 •

edited

Loading