Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert qwen2-0.5b-instruct failed when using smoothquant #2087

Closed
4 tasks
ReginaZh opened this issue Aug 5, 2024 · 2 comments
Closed
4 tasks

convert qwen2-0.5b-instruct failed when using smoothquant #2087

ReginaZh opened this issue Aug 5, 2024 · 2 comments
Assignees
Labels
bug Something isn't working waiting for feedback

Comments

@ReginaZh
Copy link
Contributor

ReginaZh commented Aug 5, 2024

System Info

GPU Type: A6000

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

git clone https://huggingface.co/Qwen/Qwen2-0.5B-Instruct
python3 ./convert_checkpoint.py --model_dir ./Qwen2-0.5B-Instruct --output_dir ./tllm_checkpoint_1gpu_sq --dtype float16 --smoothquant 0.5

Expected behavior

Successfully convert and save model checkpoints

actual behavior

Cloning into 'Qwen2-0.5B-Instruct'...
remote: Enumerating objects: 33, done.
remote: Counting objects: 100% (30/30), done.
remote: Compressing objects: 100% (30/30), done.
remote: Total 33 (delta 12), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (33/33), 3.60 MiB | 6.54 MiB/s, done.
[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024073000
0.12.0.dev2024073000
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib/python3.10/dist-packages/datasets/load.py:1429: FutureWarning: The repository for ccdv/cnn_dailymail contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/ccdv/cnn_dailymail
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
  warnings.warn(
calibrating model:   0%|                                                                                                                              | 0/512 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
calibrating model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:20<00:00, 24.44it/s]
Weights loaded. Total time: 00:00:18
Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/qwen/./convert_checkpoint.py", line 309, in <module>
    main()
  File "/app/tensorrt_llm/examples/qwen/./convert_checkpoint.py", line 301, in main
    convert_and_save_hf(args)
  File "/app/tensorrt_llm/examples/qwen/./convert_checkpoint.py", line 228, in convert_and_save_hf
    QWenForCausalLM.quantize(args.model_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 380, in quantize
    convert.quantize(hf_model_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 1207, in quantize
    safetensors.torch.save_file(
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 284, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 480, in _flatten
    raise RuntimeError(
RuntimeError:
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'transformer.vocab_embedding.weight', 'lm_head.weight'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

additional notes

transformer version: 4.42.4
TensorRT-LLM version: "0.12.0.dev2024073000"

@ReginaZh ReginaZh added the bug Something isn't working label Aug 5, 2024
Copy link

github-actions bot commented Sep 5, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

@github-actions github-actions bot added the stale label Sep 5, 2024
@lfr-0531 lfr-0531 self-assigned this Sep 6, 2024
@lfr-0531
Copy link
Collaborator

lfr-0531 commented Sep 6, 2024

It has been fixed. Can you have a try again on the main branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working waiting for feedback
Projects
None yet
Development

No branches or pull requests

3 participants