Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: cannot import name '_expand_mask' from 'transformers.models.clip.modeling_clip' #184

Open
qiuchen001 opened this issue Jul 20, 2024 · 4 comments

Comments

@qiuchen001
Copy link

qiuchen001 commented Jul 20, 2024

scenes:
CLI Inference

command:
CUDA_VISIBLE_DEVICES=0 python3 -m videollava.serve.cli --model-path "/root/Video-LLaVA-7B" --file "/root/videos/8132-207209040_small.mp4" --load-4bit

issues:
[2024-07-21 04:02:21,967] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
File "/root/.conda/envs/video-llava/lib/python3.10/runpy.py", line 187, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/root/.conda/envs/video-llava/lib/python3.10/runpy.py", line 110, in _get_module_details
import(pkg_name)
File "/root/Video-LLaVA/videollava/init.py", line 1, in
from .model import LlavaLlamaForCausalLM
File "/root/Video-LLaVA/videollava/model/init.py", line 1, in
from .language_model.llava_llama import LlavaLlamaForCausalLM, LlavaConfig
File "/root/Video-LLaVA/videollava/model/language_model/llava_llama.py", line 26, in
from ..llava_arch import LlavaMetaModel, LlavaMetaForCausalLM
File "/root/Video-LLaVA/videollava/model/llava_arch.py", line 21, in
from .multimodal_encoder.builder import build_image_tower, build_video_tower
File "/root/Video-LLaVA/videollava/model/multimodal_encoder/builder.py", line 3, in
from .languagebind import LanguageBindImageTower, LanguageBindVideoTower
File "/root/Video-LLaVA/videollava/model/multimodal_encoder/languagebind/init.py", line 6, in
from .image.modeling_image import LanguageBindImage
File "/root/Video-LLaVA/videollava/model/multimodal_encoder/languagebind/image/modeling_image.py", line 11, in
from transformers.models.clip.modeling_clip import CLIPMLP, CLIPAttention, CLIPTextEmbeddings, CLIPVisionEmbeddings,
ImportError: cannot import name '_expand_mask' from 'transformers.models.clip.modeling_clip' (/root/.conda/envs/video-llava/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py)

I've already install required packages:

git clone https://github.com/PKU-YuanGroup/Video-LLaVA
cd Video-LLaVA
conda create -n videollava python=3.10 -y
conda activate videollava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
pip install decord opencv-python git+https://github.com/facebookresearch/pytorchvideo.git@28fe037d212663c6a24f373b94cc5d478c8c1a1d

AND
pip install -U transformers

@Stevetich
Copy link

I have encountered the same problem. It seems that this problem is since the transformers version.

@sunlight146
Copy link

Did you solve the error? I encountered the same error while debugging the Video-LLaVA code.

@Wuyingwen
Copy link

Wuyingwen commented Aug 10, 2024

you can copy the following code into corresponding transformer libarary to solve the problem

def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None):
bsz, src_len = mask.size()
tgt_len = tgt_len if tgt_len is not None else src_len
expanded_mask = mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype)
inverted_mask = 1.0 - expanded_mask
return inverted_mask.masked_fill(inverted_mask.to(torch.bool), torch.finfo(dtype).min)

@lucienresearch
Copy link

Do not run pip install -U transformers, pip install transformers==4.31.0 is right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants