Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat #1953

Closed
1 of 4 tasks
Yuchen-Cao opened this issue Jul 16, 2024 · 2 comments
Closed
1 of 4 tasks
Assignees
Labels
bug Something isn't working functionality issue

Comments

@Yuchen-Cao
Copy link

Yuchen-Cao commented Jul 16, 2024

System Info

GPU NVIDIA L20

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I am trying to quantize CodeQwen1.5 7B Chat to FP8 using a modified version of the example quantization script:

python quantization/quantize.py --model_dir /mnt/models/CodeQwen1.5-7B-Chat \
                                --dtype float16 \
                                --qformat fp8 \
                                --kv_cache_dtype fp8 \
                                --output_dir /mnt/trt_models/codeqwen1.5_7b_checkpoint_1gpu_fp8_fp8kv \
                                --calib_size 512 \
                                --calib_dataset /mnt/dataset/cnn_dailymail

Expected behavior

The outside quantize.py will use quantize_and_export() to run quantization, and it is defined inside https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/quantization/quantize_by_modelopt.py

get_tokenizer will automatically read the tokenizer from my model_dir and set the pad_token as well as the eos_token.

actual behavior

But it failed to set the pad_token:

[07/16/2024-13:46:30] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error
[07/16/2024-13:46:30] [TRT-LLM] [I] Starting TensorRT-LLM init.
[TensorRT-LLM][INFO] Set logger level by INFO
[07/16/2024-13:46:30] [TRT-LLM] [I] TensorRT-LLM inited.
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024061100
Initializing model from /mnt/models/CodeQwen1.5-7B-Chat
[07/16/2024-13:47:14] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:28<00:00,  7.20s/it]
[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch.bfloat16.
Initializing tokenizer from /mnt/models/CodeQwen1.5-7B-Chat
Traceback (most recent call last):
  File "quantization/quantize.py", line 90, in <module>
    quantize_and_export(
  File "/opt/conda/lib/python3.8/site-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 289, in quantize_and_export
    tokenizer = get_tokenizer(model_dir,
  File "/opt/conda/lib/python3.8/site-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 147, in get_tokenizer
    assert tokenizer.pad_token is not None, f"Pad token for {model_type} cannot be set!"
AssertionError: Pad token for qwen cannot be set!

additional notes

I commented out some lines except for the AutoTokenizer.from_pretrained() to get this case worked.

def get_tokenizer(ckpt_path, max_seq_length=2048, model_type=None):
    print(f"Initializing tokenizer from {ckpt_path}")
    tokenizer = AutoTokenizer.from_pretrained(
        ckpt_path,
        model_max_length=max_seq_length,
        padding_side="left",
        trust_remote_code=True,
    )
    # if model_type and model_type == "qwen":
    #     # qwen use token id 151643 as pad and eos tokens
    #     tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643)
    #     tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643)

    # # can't set attribute 'pad_token' for "<unk>"
    # if tokenizer.pad_token != "<unk>":  # nosec B105
    #     tokenizer.pad_token = tokenizer.eos_token
    # if tokenizer.pad_token is None:
    #     tokenizer.pad_token = tokenizer.eos_token
    # assert tokenizer.pad_token is not None, f"Pad token for {model_type} cannot be set!"

    return tokenizer

I know that commenting out these lines will certainly affect other model's conversion. It seems there needs to be a fix on this function to support CodeQwen1.5.

@Yuchen-Cao Yuchen-Cao added the bug Something isn't working label Jul 16, 2024
@Yuchen-Cao Yuchen-Cao changed the title [Quantization] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat [Bug] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat Jul 16, 2024
@Yuchen-Cao Yuchen-Cao changed the title [Bug] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat [Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat Jul 16, 2024
@QiJune
Copy link
Collaborator

QiJune commented Jul 16, 2024

@Tracin Could you please have a look? Thanks

@Tracin
Copy link
Collaborator

Tracin commented Jul 17, 2024

@Yuchen-Cao Thanks! We have fixed this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working functionality issue
Projects
None yet
Development

No branches or pull requests

3 participants