Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2-1.5B-Instruct convert_checkpoint.py failed #2388

Open
2 of 4 tasks
1994 opened this issue Oct 29, 2024 · 3 comments
Open
2 of 4 tasks

Qwen2-1.5B-Instruct convert_checkpoint.py failed #2388

1994 opened this issue Oct 29, 2024 · 3 comments
Assignees
Labels
bug Something isn't working stale triaged Issue has been triaged by maintainers

Comments

@1994
Copy link

1994 commented Oct 29, 2024

System Info

  • CPU: x86_64
  • GPU: A10 (24G)

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I execute convert script in docker:

nvcr.io/nvidia/tritonserver:24.09-trtllm-python-py3

command:

python3 convert_checkpoint.py --model_dir <Qwen2-1.5B-Instruct_PATH> --output_dir <Qwen2-1.5B-Instruct_PATH>/tllm --dtype float16

model file:

https://huggingface.co/Qwen/Qwen2-1.5B-Instruct

exception stack:

[10/29/2024-17:08:19] [TRT-LLM] [W] Found pynvml==11.5.3 and cuda driver version 470.182.03. Please use pynvml>=11.5.0 and cuda driver>=526 to get accurate memory usage.
[TensorRT-LLM] TensorRT-LLM version: 0.13.0
0.13.0
229it [00:02, 93.96it/s] 
Traceback (most recent call last):
  File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 303, in <module>
    main()
  File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 295, in main
    convert_and_save_hf(args)
  File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 251, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 258, in execute
    f(args, rank)
  File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 241, in convert_and_save_rank
    qwen = QWenForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 427, in from_hugging_face
    loader.generate_tllm_weights(model)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 357, in generate_tllm_weights
    self.load(tllm_key,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 278, in load
    v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 391, in postprocess
    weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
Exception ignored in: <function PretrainedModel.__del__ at 0x7f778f992050>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 453, in __del__
    self.release()
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 450, in release
    release_gc()
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 471, in release_gc
    torch.cuda.ipc_collect()
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 901, in ipc_collect
    _lazy_init()
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 330, in _lazy_init
    raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable

Expected behavior

convert success

actual behavior

convert failed

additional notes

tensorRT-llm 0.13.0

@1994 1994 added the bug Something isn't working label Oct 29, 2024
@yatoooon
Copy link

same case

@hello-11 hello-11 added the triaged Issue has been triaged by maintainers label Oct 30, 2024
@nv-guomingz
Copy link
Collaborator

Hi @1994 , you may try apply below hot fix for qwen2 1.5B model with latest code base.

Image

@nv-guomingz
Copy link
Collaborator

@1994 do u still have further issue or question now? If not, we'll close it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants