Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idefics2 Model Fails with Versions 4.45.0 and 4.45.1 (Shape Mismatch Error) #33752

Closed
1 of 4 tasks
movchan74 opened this issue Sep 27, 2024 · 5 comments
Closed
1 of 4 tasks
Labels

Comments

@movchan74
Copy link

System Info

  • transformers version: 4.45.1
  • Platform: Linux-5.4.0-153-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.25.1
  • Safetensors version: 0.4.5
  • Accelerate version: 0.34.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA RTX A6000

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
from transformers.image_utils import load_image

DEVICE = "cuda:0"

# Note that passing the image urls (instead of the actual pil images) to the processor is also possible
image1 = load_image(
    "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
)
image2 = load_image(
    "https://cdn.britannica.com/59/94459-050-DBA42467/Skyline-Chicago.jpg"
)
image3 = load_image(
    "https://cdn.britannica.com/68/170868-050-8DDE8263/Golden-Gate-Bridge-San-Francisco.jpg"
)

processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b")
model = AutoModelForVision2Seq.from_pretrained(
    "HuggingFaceM4/idefics2-8b",
    torch_dtype=torch.bfloat16,
).to(DEVICE)

# Create inputs
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "What do we see in this image?"},
        ],
    },
    {
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "In this image, we can see the city of New York, and more specifically the Statue of Liberty.",
            },
        ],
    },
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "And how about this image?"},
        ],
    },
]
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image1, image2], return_tensors="pt")
inputs = {k: v.to(DEVICE) for k, v in inputs.items()}


# Generate
generated_ids = model.generate(**inputs, max_new_tokens=500)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)

print(generated_texts)

The code is taken from HuggingFace's Idefics2 model page.

Error message:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[2], line 56
     52 inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
     55 # Generate
---> 56 generated_ids = model.generate(**inputs, max_new_tokens=500)
     57 generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
     59 print(generated_texts)

File ~/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File ~/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py:2048, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   2040     input_ids, model_kwargs = self._expand_inputs_for_generation(
   2041         input_ids=input_ids,
   2042         expand_size=generation_config.num_return_sequences,
   2043         is_encoder_decoder=self.config.is_encoder_decoder,
   2044         **model_kwargs,
   2045     )
   2047     # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> 2048     result = self._sample(
   2049         input_ids,
...
   1295 reshaped_image_hidden_states = image_hidden_states.view(-1, vision_hidden_size)
-> 1296 new_inputs_embeds[special_image_token_mask] = reshaped_image_hidden_states
   1297 return new_inputs_embeds

RuntimeError: shape mismatch: value tensor of shape [640, 4096] cannot be broadcast to indexing result of shape [0, 4096]

Expected behavior

The Idefics2 model does not work with the new transformers releases.

It works with transformers==4.44.2. However, both 4.45.0 and 4.45.1 cause shape mismatch error.

The model should generate the output as it does with the transformers==4.44.2 version and not fail on the latest releases.

@LysandreJik
Copy link
Member

cc @andimarafioti and @amyeroberts

@amyeroberts
Copy link
Collaborator

Hi @movchan74 - thanks for reporting!

I've opened #33776 which should fix this. @LysandreJik as this is a regression, should we include the fix in a patch once approved?

@LysandreJik
Copy link
Member

yes we can indeed

cc @ArthurZucker

@amyeroberts
Copy link
Collaborator

For the patch -- #33766 commit baa765f was merged in instead

@ArthurZucker
Copy link
Collaborator

Closing as this was fixed, will include it in the patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants