You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cloning into 'Qwen2-0.5B-Instruct'...
remote: Enumerating objects: 33, done.
remote: Counting objects: 100% (30/30), done.
remote: Compressing objects: 100% (30/30), done.
remote: Total 33 (delta 12), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (33/33), 3.60 MiB | 6.54 MiB/s, done.
[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024073000
0.12.0.dev2024073000
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib/python3.10/dist-packages/datasets/load.py:1429: FutureWarning: The repository for ccdv/cnn_dailymail contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/ccdv/cnn_dailymail
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
warnings.warn(
calibrating model: 0%| | 0/512 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
calibrating model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:20<00:00, 24.44it/s]
Weights loaded. Total time: 00:00:18
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/qwen/./convert_checkpoint.py", line 309, in <module>
main()
File "/app/tensorrt_llm/examples/qwen/./convert_checkpoint.py", line 301, in main
convert_and_save_hf(args)
File "/app/tensorrt_llm/examples/qwen/./convert_checkpoint.py", line 228, in convert_and_save_hf
QWenForCausalLM.quantize(args.model_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 380, in quantize
convert.quantize(hf_model_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 1207, in quantize
safetensors.torch.save_file(
File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 284, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 480, in _flatten
raise RuntimeError(
RuntimeError:
Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'transformer.vocab_embedding.weight', 'lm_head.weight'}].
A potential way to correctly save your model is to use `save_model`.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors
System Info
GPU Type: A6000
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
git clone https://huggingface.co/Qwen/Qwen2-0.5B-Instruct
python3 ./convert_checkpoint.py --model_dir ./Qwen2-0.5B-Instruct --output_dir ./tllm_checkpoint_1gpu_sq --dtype float16 --smoothquant 0.5
Expected behavior
Successfully convert and save model checkpoints
actual behavior
additional notes
transformer version: 4.42.4
TensorRT-LLM version: "0.12.0.dev2024073000"
The text was updated successfully, but these errors were encountered: