Idefics2 Model Fails with Versions 4.45.0 and 4.45.1 (Shape Mismatch Error) #33752

movchan74 · 2024-09-27T08:37:32Z

System Info

transformers version: 4.45.1
Platform: Linux-5.4.0-153-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.25.1
Safetensors version: 0.4.5
Accelerate version: 0.34.2
Accelerate config: not found
PyTorch version (GPU?): 2.4.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA RTX A6000

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
from transformers.image_utils import load_image

DEVICE = "cuda:0"

# Note that passing the image urls (instead of the actual pil images) to the processor is also possible
image1 = load_image(
    "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
)
image2 = load_image(
    "https://cdn.britannica.com/59/94459-050-DBA42467/Skyline-Chicago.jpg"
)
image3 = load_image(
    "https://cdn.britannica.com/68/170868-050-8DDE8263/Golden-Gate-Bridge-San-Francisco.jpg"
)

processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b")
model = AutoModelForVision2Seq.from_pretrained(
    "HuggingFaceM4/idefics2-8b",
    torch_dtype=torch.bfloat16,
).to(DEVICE)

# Create inputs
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "What do we see in this image?"},
        ],
    },
    {
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "In this image, we can see the city of New York, and more specifically the Statue of Liberty.",
            },
        ],
    },
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "And how about this image?"},
        ],
    },
]
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image1, image2], return_tensors="pt")
inputs = {k: v.to(DEVICE) for k, v in inputs.items()}


# Generate
generated_ids = model.generate(**inputs, max_new_tokens=500)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)

print(generated_texts)

The code is taken from HuggingFace's Idefics2 model page.

Error message:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[2], line 56
     52 inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
     55 # Generate
---> 56 generated_ids = model.generate(**inputs, max_new_tokens=500)
     57 generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
     59 print(generated_texts)

File ~/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File ~/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py:2048, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   2040     input_ids, model_kwargs = self._expand_inputs_for_generation(
   2041         input_ids=input_ids,
   2042         expand_size=generation_config.num_return_sequences,
   2043         is_encoder_decoder=self.config.is_encoder_decoder,
   2044         **model_kwargs,
   2045     )
   2047     # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> 2048     result = self._sample(
   2049         input_ids,
...
   1295 reshaped_image_hidden_states = image_hidden_states.view(-1, vision_hidden_size)
-> 1296 new_inputs_embeds[special_image_token_mask] = reshaped_image_hidden_states
   1297 return new_inputs_embeds

RuntimeError: shape mismatch: value tensor of shape [640, 4096] cannot be broadcast to indexing result of shape [0, 4096]

Expected behavior

The Idefics2 model does not work with the new transformers releases.

It works with transformers==4.44.2. However, both 4.45.0 and 4.45.1 cause shape mismatch error.

The model should generate the output as it does with the transformers==4.44.2 version and not fail on the latest releases.

The text was updated successfully, but these errors were encountered:

LysandreJik · 2024-09-27T11:24:25Z

cc @andimarafioti and @amyeroberts

amyeroberts · 2024-09-27T18:30:15Z

Hi @movchan74 - thanks for reporting!

I've opened #33776 which should fix this. @LysandreJik as this is a regression, should we include the fix in a patch once approved?

LysandreJik · 2024-09-30T18:31:45Z

yes we can indeed

cc @ArthurZucker

amyeroberts · 2024-09-30T19:44:03Z

For the patch -- #33766 commit baa765f was merged in instead

ArthurZucker · 2024-10-04T14:51:22Z

Closing as this was fixed, will include it in the patch!

movchan74 added the bug label Sep 27, 2024

movchan74 mentioned this issue Sep 27, 2024

Structured Generation Support for vLLM Deployment mobiusml/aana_sdk#182

Merged

amyeroberts mentioned this issue Sep 27, 2024

[IDEFICS2] Fix past_seen_tokens unset bug #33776

Closed

This was referenced Sep 27, 2024

Idefics 2 fixed bug shown up again #33763

Closed

Fixes for issue #33763 in idefics2 model #33766

Merged

ArthurZucker closed this as completed Oct 4, 2024

movchan74 mentioned this issue Nov 28, 2024

BLIP2 Model Fails with Version 4.46.3 (Shape Mismatch Error) #34990

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idefics2 Model Fails with Versions 4.45.0 and 4.45.1 (Shape Mismatch Error) #33752

Idefics2 Model Fails with Versions 4.45.0 and 4.45.1 (Shape Mismatch Error) #33752

movchan74 commented Sep 27, 2024

LysandreJik commented Sep 27, 2024

amyeroberts commented Sep 27, 2024

LysandreJik commented Sep 30, 2024

amyeroberts commented Sep 30, 2024

ArthurZucker commented Oct 4, 2024

Idefics2 Model Fails with Versions 4.45.0 and 4.45.1 (Shape Mismatch Error) #33752

Idefics2 Model Fails with Versions 4.45.0 and 4.45.1 (Shape Mismatch Error) #33752

Comments

movchan74 commented Sep 27, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

LysandreJik commented Sep 27, 2024

amyeroberts commented Sep 27, 2024

LysandreJik commented Sep 30, 2024

amyeroberts commented Sep 30, 2024

ArthurZucker commented Oct 4, 2024