Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Qwen2-VL AssertionError: assert "factor" in rope_scaling. #8281

Closed
1 task done
zhangxi1997 opened this issue Sep 9, 2024 · 25 comments · Fixed by #7905
Closed
1 task done

[Bug]: Qwen2-VL AssertionError: assert "factor" in rope_scaling. #8281

zhangxi1997 opened this issue Sep 9, 2024 · 25 comments · Fixed by #7905
Labels
bug Something isn't working

Comments

@zhangxi1997
Copy link

zhangxi1997 commented Sep 9, 2024

Your current environment

The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.28.1
Libc version: glibc-2.35

Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.10.134-008.7.kangaroo.al8.x86_64-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.3.107
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB
Nvidia driver version: 470.199.02
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.0.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   46 bits physical, 57 bits virtual
Byte Order:                      Little Endian
CPU(s):                          12
On-line CPU(s) list:             0-11
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) Processor @ 2.90GHz
CPU family:                      6
Model:                           106
Thread(s) per core:              1
Core(s) per socket:              12
Socket(s):                       1
Stepping:                        6
BogoMIPS:                        5800.00
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd avx512vbmi umip pku avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm md_clear arch_capabilities
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       576 KiB (12 instances)
L1i cache:                       384 KiB (12 instances)
L2 cache:                        15 MiB (12 instances)
L3 cache:                        48 MiB (1 instance)
NUMA node(s):                    1
NUMA node0 CPU(s):               0-11
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Vulnerable
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:        Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Vulnerable
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.6.68
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pyzmq==26.2.0
[pip3] torch==2.4.0
[pip3] torchvision==0.19.0
[pip3] transformers==4.45.0.dev0
[pip3] transformers-stream-generator==0.0.5
[pip3] triton==3.0.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
[conda] nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
[conda] nvidia-ml-py              12.560.30                pypi_0    pypi
[conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.6.68                  pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
[conda] pyzmq                     26.2.0                   pypi_0    pypi
[conda] torch                     2.4.0                    pypi_0    pypi
[conda] torchvision               0.19.0                   pypi_0    pypi
[conda] transformers              4.45.0.dev0              pypi_0    pypi
[conda] transformers-stream-generator 0.0.5                    pypi_0    pypi
[conda] triton                    3.0.0                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.0@32e7db25365415841ebc7c4215851743fbb1bad1
vLLM Build Flags:
CUDA Archs: 5.2 6.0 6.1 7.0 7.2 7.5 8.0 8.6 8.7 9.0+PTX; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    mlx5_0  CPU Affinity    NUMA Affinity
GPU0     X      PHB     0-11            N/A
mlx5_0  PHB      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    ModelType, get_vllm_engine, get_default_template_type,
    get_template, inference_vllm
)

model_type = ModelType.qwen2_vl_2b_instruct
model_id_or_path = '/hub/qwen/Qwen2-VL-2B-Instruct'
llm_engine = get_vllm_engine(model_type, model_id_or_path=model_id_or_path)
template_type = get_default_template_type(model_type)
template = get_template(template_type, llm_engine.hf_tokenizer)

llm_engine.generation_config.max_new_tokens = 256

images = ['1.jpg']
request_list = [{'query': 'Describe this screenshot.', 'images': images}]
resp_list = inference_vllm(llm_engine, template, request_list)
for request, resp in zip(request_list, resp_list):
    print(f"query: {request['query']}")
    print(f"response: {resp['response']}")

Obtaining a bug as follows:

# python infer_qwen2vl_vllm.py 
[INFO:swift] Successfully registered `/swift/swift/llm/data/dataset_info.json`
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
[INFO:swift] Loading the model using model_dir: /hub/qwen/Qwen2-VL-2B-Instruct
Traceback (most recent call last):
  File "/xxxx/xxxx/swift/infer_qwen2vl_vllm.py", line 12, in <module>
    llm_engine = get_vllm_engine(model_type, model_id_or_path=model_id_or_path)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxxx/xxxx/swift/swift/llm/utils/vllm_utils.py", line 103, in get_vllm_engine
    llm_engine = llm_engine_cls.from_engine_args(engine_args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxxx/miniconda3/envs/qwen2-vl/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 535, in from_engine_args
    engine_config = engine_args.create_engine_config()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxxx/miniconda3/envs/qwen2-vl/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 792, in create_engine_config
    model_config = ModelConfig(
                   ^^^^^^^^^^^^
  File "/xxxx/miniconda3/envs/qwen2-vl/lib/python3.11/site-packages/vllm/config.py", line 222, in __init__
    self.max_model_len = _get_and_verify_max_len(
                         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxxx/miniconda3/envs/qwen2-vl/lib/python3.11/site-packages/vllm/config.py", line 1738, in _get_and_verify_max_len
    assert "factor" in rope_scaling
           ^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@zhangxi1997 zhangxi1997 added the bug Something isn't working label Sep 9, 2024
@DarkLight1337
Copy link
Member

DarkLight1337 commented Sep 9, 2024

Qwen2-VL is not supported yet. Please note that Qwen-VL in the list of supported models refers to version 1.

@zhangxi1997
Copy link
Author

Qwen2-VL is not supported yet. Please note that Qwen-VL in the list of supported models refers to version 1.

Thanks a lot, and looking forward to the support!

@zhangfan-algo
Copy link

Can we support Qwen2-VL now?

@DarkLight1337
Copy link
Member

Please check the status of PR #7905

@Joker-sad
Copy link

目前官网上看是支持qwen2-vl模型但是为什么还是会有相同的报错

@DarkLight1337
Copy link
Member

It is caused by a bug in transformers v4.45. For now, you'll have to either downgrade to a lower version of vLLM that doesn't use transformers v4.45, or build vLLM from source to use our patched Qwen2VL config which doesn't have this problem.

@Joker-sad
Copy link

我采用以下方式进行:
创建了一个新的环境:
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
git clone https://github.com/vllm-project/vllm.git
cd vllm
python python_only_dev.py
python -m vllm.entrypoints.openai.api_server --model /home/py/ycc/Qwen/qwen/Qwen2-VL-72B-Instruct-gptq-int4 --port 9991 --gpu-memory-utilization 0.9
第一次进行run的时候是可以的不过因为显存原因挂掉了
后来换成7B的模型之后就跑不起来了,出现了问题:
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 101: invalid device ordinal
RuntimeError: Engine process failed to start
重复此操作依旧没有得到改善

@DarkLight1337
Copy link
Member

我采用以下方式进行: 创建了一个新的环境: pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl git clone https://github.com/vllm-project/vllm.git cd vllm python python_only_dev.py python -m vllm.entrypoints.openai.api_server --model /home/py/ycc/Qwen/qwen/Qwen2-VL-72B-Instruct-gptq-int4 --port 9991 --gpu-memory-utilization 0.9 第一次进行run的时候是可以的不过因为显存原因挂掉了 后来换成7B的模型之后就跑不起来了,出现了问题: RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 101: invalid device ordinal RuntimeError: Engine process failed to start 重复此操作依旧没有得到改善

Can you run collect_env.py and post the output?

@DarkLight1337
Copy link
Member

You should install vLLM using pip install -e . after cloning the repository. The wheel install is not necessary if you're already cloning the repo.

@youkaichao
Copy link
Member

You should install vLLM using pip install -e . after cloning the repository. The wheel install is not necessary if you're already cloning the repo.

correction: that is python-only dev, and is correct. see https://docs.vllm.ai/en/latest/getting_started/installation.html#python-only-build-without-compilation

@RickyL-2000
Copy link

Just for those who don't wanna upgrade vllm and still encounter this problem, the following code works for me:

model = LLM(
        model=model_path,
        rope_scaling={
            "mrope_section": [
                16,
                24,
                24
            ],
            "rope_type": "mrope",
            "type": "mrope"
        }
    )

and I'm using Qwen2VL 7B

@lxiaoxiaoxing
Copy link

Just for those who don't wanna upgrade vllm and still encounter this problem, the following code works for me:

model = LLM(
        model=model_path,
        rope_scaling={
            "mrope_section": [
                16,
                24,
                24
            ],
            "rope_type": "mrope",
            "type": "mrope"
        }
    )

and I'm using Qwen2VL 7B

python -m vllm.entrypoints.openai.api_server How to add these parameters if started in this way?

@DarkLight1337
Copy link
Member

You can pass --rope-scaling as a JSON string.

@lxiaoxiaoxing
Copy link

您可以将其--rope-scaling作为 JSON 字符串传递。

Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/api_server.py", line 495, in
asyncio.run(run_server(args))
File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/api_server.py", line 462, in run_server
async with build_async_engine_client(args) as async_engine_client:
File "/usr/lib/python3.8/contextlib.py", line 171, in aenter
return await self.gen.anext()
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/api_server.py", line 108, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/lib/python3.8/contextlib.py", line 171, in aenter
return await self.gen.anext()
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/api_server.py", line 130, in build_async_engine_client_from_engine_args
if (model_is_embedding(engine_args.model, engine_args.trust_remote_code,
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/api_server.py", line 71, in model_is_embedding
return ModelConfig(model=model_name,
File "/usr/local/lib/python3.8/dist-packages/vllm/config.py", line 222, in init
self.max_model_len = _get_and_verify_max_len(
File "/usr/local/lib/python3.8/dist-packages/vllm/config.py", line 1738, in _get_and_verify_max_len
assert "factor" in rope_scaling
AssertionError
qwen2-vl model Still reporting this error , my version
torch 2.4.0+cu118
transformers 4.45.0
vllm 0.6.0+cu118
xformers 0.0.27.post2+cu118

@DarkLight1337
Copy link
Member

Please show the command you used.

@lxiaoxiaoxing
Copy link

Please show the command you used.

python -m vllm.entrypoints.openai.api_server --served-model-name vlmodel --model /model --dtype=half --gpu-memory-utilization 0.9 --max_model_len 1 --rope-scaling '{"mrope_section": [16,24, 24], "rope_type": "mrope", "type": "mrope"}'

@DarkLight1337
Copy link
Member

Can you manually update the config.json of your model?

@DarkLight1337
Copy link
Member

Otherwise it would be best to just upgrade vLLM.

@hm123450
Copy link

Can you manually update the config.json of your model?

hello, I met the same problem and for some reason. I have to use 0.6.1(or is there any other version support for cuda 11.8?). I want to know what do you mean update config.json, where should I change the code?

@DarkLight1337
Copy link
Member

I mean that you go to the location where the model is downloaded on your machine and edit the config.json file directly.

@hm123450
Copy link

I mean that you go to the location where the model is downloaded on your machine and edit the config.json file directly.

yeah, I change the code. You can see I add the "rope_type": "mrope" . But it's still not work. What I miss?

{
"architectures": [
"Qwen2VLForConditionalGeneration"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"vision_start_token_id": 151652,
"vision_end_token_id": 151653,
"vision_token_id": 151654,
"image_token_id": 151655,
"video_token_id": 151656,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2_vl",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": 32768,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.41.2",
"use_cache": true,
"use_sliding_window": false,
"vision_config": {
"depth": 32,
"embed_dim": 1280,
"mlp_ratio": 4,
"num_heads": 16,
"in_chans": 3,
"hidden_size": 3584,
"patch_size": 14,
"spatial_merge_size": 2,
"spatial_patch_size": 14,
"temporal_patch_size": 2
},
"rope_scaling": {
"type": "mrope",
"rope_type": "mrope",
"mrope_section": [
16,
24,
24
]
},
"vocab_size": 152064
}

@DarkLight1337
Copy link
Member

I think you just have to update vLLM then. I have no time to debug this in an older version.

@hm123450
Copy link

I think you just have to update vLLM then. I have no time to debug this in an older version.

Would I like to ask how can i install the higher version vllm with cuda 11.8 ? I try to install vllm-0.6.4 by pip install -e. and it will install torch 2.5 with cuda 12 for me automatically. (By the way , python=3.10)

@DarkLight1337
Copy link
Member

DarkLight1337 commented Dec 15, 2024

You can try installing torch manually: https://pytorch.org/get-started/previous-versions/

and follow this section to keep your installed version.

@youkaichao might have more context on this.

@hm123450
Copy link

You can try installing torch manually: https://pytorch.org/get-started/previous-versions/

and follow this section to keep your installed version.

@youkaichao might have more context on this.

OK, I will try out . Thank you very much ! You are so kind and friendly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants