-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, is there a bug in Video-LLaVA-main/videollava/model/multimodal_encoder/builder.py? #89
Comments
What is your "image tower"? The assertion function enforces the encoder's output dimension to be 1024. It appears that 768 is the dimension for a base version of the image encoder. |
I have the same problem in local computer, but it works in https://colab.research.google.com/. |
save issue |
Hi everyone, what is your "image_tower"? is there a minimal runtime code to help me reproduce the error? |
config file: "intermediate_size": 11008, |
If you want to run model locally, maybe you can refer to this issue. |
I sovled! I changed the code just like def build_image_tower(image_tower_cfg, **kwargs):
image_tower = getattr(image_tower_cfg, 'mm_image_tower', getattr(image_tower_cfg, 'image_tower', None))
is_absolute_path_exists = os.path.exists(image_tower)
# if is_absolute_path_exists or image_tower.startswith("openai") or image_tower.startswith("laion"):
# return CLIPVisionTower(image_tower, args=image_tower_cfg, **kwargs)
if image_tower.startswith("openai") or image_tower.startswith("laion"):
return CLIPVisionTower(image_tower, args=image_tower_cfg, **kwargs)
if image_tower.endswith('LanguageBind_Image'):
return LanguageBindImageTower(image_tower, args=image_tower_cfg, cache_dir='./cache_dir', **kwargs)
if 'mae' in image_tower:
print('maemaemaemaemaemaemaemae')
print('maemaemaemaemaemaemaemae')
print('maemaemaemaemaemaemaemae')
print('maemaemaemaemaemaemaemae')
print('maemaemaemaemaemaemaemae')
return MAEVisionTower(image_tower, args=image_tower_cfg, cache_dir='./cache_dir', **kwargs)
raise ValueError(f'Unknown image tower: {image_tower}') In fact, if you choose running locally, and you should choose the second "if". |
Great! Congrats |
I want to do finetune based on native llama and languagebind.
In principle, if the model is downloaded locally, it will take the first "if" (because if is_absolute_path_exists is True), but this will cause it to a misalign error.
But if I manually switch to the second branch, it says imagetower and videotower's hiddendim are different.
But I think my configuration files are all pulled from huggingface, there should be no configuration errors? So what causes such a strange phenomenon?
The text was updated successfully, but these errors were encountered: