convert qwen2-0.5b-instruct failed when using smoothquant #2087

ReginaZh · 2024-08-05T12:34:10Z

System Info

GPU Type: A6000

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

git clone https://huggingface.co/Qwen/Qwen2-0.5B-Instruct
python3 ./convert_checkpoint.py --model_dir ./Qwen2-0.5B-Instruct --output_dir ./tllm_checkpoint_1gpu_sq --dtype float16 --smoothquant 0.5

Expected behavior

Successfully convert and save model checkpoints

actual behavior

Cloning into 'Qwen2-0.5B-Instruct'...
remote: Enumerating objects: 33, done.
remote: Counting objects: 100% (30/30), done.
remote: Compressing objects: 100% (30/30), done.
remote: Total 33 (delta 12), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (33/33), 3.60 MiB | 6.54 MiB/s, done.
[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024073000
0.12.0.dev2024073000
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib/python3.10/dist-packages/datasets/load.py:1429: FutureWarning: The repository for ccdv/cnn_dailymail contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/ccdv/cnn_dailymail
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
  warnings.warn(
calibrating model:   0%|                                                                                                                              | 0/512 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
calibrating model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:20<00:00, 24.44it/s]
Weights loaded. Total time: 00:00:18
Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/qwen/./convert_checkpoint.py", line 309, in <module>
    main()
  File "/app/tensorrt_llm/examples/qwen/./convert_checkpoint.py", line 301, in main
    convert_and_save_hf(args)
  File "/app/tensorrt_llm/examples/qwen/./convert_checkpoint.py", line 228, in convert_and_save_hf
    QWenForCausalLM.quantize(args.model_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 380, in quantize
    convert.quantize(hf_model_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 1207, in quantize
    safetensors.torch.save_file(
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 284, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 480, in _flatten
    raise RuntimeError(
RuntimeError:
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'transformer.vocab_embedding.weight', 'lm_head.weight'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

additional notes

transformer version: 4.42.4
TensorRT-LLM version: "0.12.0.dev2024073000"

The text was updated successfully, but these errors were encountered:

github-actions · 2024-09-05T01:58:57Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

lfr-0531 · 2024-09-06T01:53:06Z

It has been fixed. Can you have a try again on the main branch?

ReginaZh added the bug Something isn't working label Aug 5, 2024

Shixiaowei02 mentioned this issue Aug 20, 2024

Update TensorRT-LLM #2130

Merged

github-actions bot added the stale label Sep 5, 2024

lfr-0531 self-assigned this Sep 6, 2024

lfr-0531 added the waiting for feedback label Sep 6, 2024

github-actions bot removed the stale label Sep 6, 2024

juney-nvidia closed this as completed Sep 7, 2024

Shixiaowei02 mentioned this issue Sep 30, 2024

TensorRT-LLM v0.13 Update #2269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert qwen2-0.5b-instruct failed when using smoothquant #2087

convert qwen2-0.5b-instruct failed when using smoothquant #2087

ReginaZh commented Aug 5, 2024 •

edited

Loading

github-actions bot commented Sep 5, 2024

lfr-0531 commented Sep 6, 2024

convert qwen2-0.5b-instruct failed when using smoothquant #2087

convert qwen2-0.5b-instruct failed when using smoothquant #2087

Comments

ReginaZh commented Aug 5, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

github-actions bot commented Sep 5, 2024

lfr-0531 commented Sep 6, 2024

ReginaZh commented Aug 5, 2024 •

edited

Loading