IDEFICS: allow interpolation of vision's pos embeddings #26029

leot13 · 2023-09-07T10:27:43Z

What does this PR do?

Allows vision position embeddings to be interpolated. Thus allowing bigger images to be passed to the model.
Fixes issue #26154

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker @amyeroberts

HuggingFaceDocBuilderDev · 2023-09-07T10:46:38Z

The documentation is not available anymore as the PR was closed or merged.

amyeroberts

Thanks for adding this!

Few small comments - mostly structure and styling suggestions to be in line with other models

src/transformers/models/idefics/configuration_idefics.py

amyeroberts · 2023-09-07T16:21:18Z

tests/models/idefics/test_modeling_idefics.py

@@ -169,6 +169,33 @@ def prepare_config_and_inputs(self):

        return (config, input_ids, input_mask, pixel_values, image_attention_mask)

+    def prepare_config_and_inputs_for_image_pos_embeddings_interpolation(self):
+        self.seq_length = 42


The original seq_length should be reverted to after running the test: each test should be independent and not rely on changing state of other tests.

Out of interest, why change this?

I reused the prepare_config_and_inputs() method to make the test for interpolation, but I am not sure why the seq_len should be changed. @stas00 probably has a better idea as to why it is here

may be move the setting of it to:

transformers/tests/models/idefics/test_modeling_idefics.py

Line 54 in 857b45c

seq_length=7,

so it's the same everywhere unless it's not

Thanks, yes, this is using this parameter now.

amyeroberts · 2023-09-07T16:29:23Z

tests/models/idefics/test_modeling_idefics.py

+
+        input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size)
+
+        num_images = 2 if self.add_multiple_images else 1


Same here - this means another test can modify the state and a different test is run. There should be two new tests added: one which tests with on image and one which tests with two

I fixed the tests according to your comments. Mainly, I took off the self.add_multiple_images / self.num_images, took off the change of self.seq_len, and made 3 different tests instead of 2 previously.

amyeroberts · 2023-09-07T16:37:00Z

src/transformers/models/idefics/vision.py

+            mode="bicubic",
+            align_corners=False,
+        )
+        assert int(h0) == patch_pos_embed.shape[-2] and int(w0) == patch_pos_embed.shape[-1]


We don't have asserts added in the modeling files - this should be removed or made into an exception.

(if it was my own code - I'd have it as an assert, but this is the standard in transformers)

src/transformers/models/idefics/vision.py

amyeroberts · 2023-09-07T18:15:08Z

src/transformers/models/idefics/vision.py

@@ -380,6 +425,7 @@ class IdeficsVisionTransformer(nn.Module):
    def __init__(self, config: IdeficsVisionConfig):
        super().__init__()
        self.config = config
+        self.interpolate_pos_encoding = config.interpolate_pos_encoding


To align with other models, this should be a kwarg in the forward method, rather than set by the config

Indeed that makes more sense

Co-authored-by: amyeroberts <[email protected]>

amyeroberts

Thanks for iterating on this!

Just some small tidy-ups: removing param name in config, removing print statements, correct setting for a test.

tests/models/idefics/test_modeling_idefics.py

src/transformers/models/idefics/configuration_idefics.py

Co-authored-by: amyeroberts <[email protected]>

leot13 · 2023-09-11T13:39:15Z

Thanks for all the comments and suggestions! They should all be answered now

amyeroberts

Thanks for adding!

ArthurZucker

Looks good to me left a few nits

src/transformers/models/idefics/vision.py

ArthurZucker · 2023-09-14T19:32:36Z

tests/models/idefics/test_modeling_idefics.py

        self.model_tester.create_and_check_model(*config_and_inputs)

+    def test_model_with_image_pos_embeddings_interpolation(self):


interpolate is never tested with 2 images is this expected to not work?

No, the opposite. I added the multiple image tests for interpolation and generation

Co-authored-by: Arthur <[email protected]>

…26029) * add pos embed interpolation for vision encoder * style * update config with interpolate_pos_encoding arg * fix imports formatting * take off copied from on vision embeddings * add test for image embeddings interpolation * add credit for interpolation code * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/idefics/vision.py Co-authored-by: amyeroberts <[email protected]> * fix condition to check nbr image patches match shape of pos embeddings * use kwargs in the forward methods for interpolation * fix tests * have interpolate_pos_encoding default to False instead of None * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <[email protected]> * take off for loop meant to print k,v * add interpolate_pos_encoding arg in prepare_inputs_for_generation * add test for interpolated generation * fix edge case num_patches == num_positions and height == width * add test for edge case * fix pos_embed in interpolate * allow interpolation in bf16 with upcasting * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <[email protected]> * add multiple images tests for interpolation and generation --------- Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Arthur <[email protected]>

leot13 added 2 commits September 7, 2023 12:05

add pos embed interpolation for vision encoder

c16631d

style

242ca96

leot13 added 5 commits September 7, 2023 12:48

update config with interpolate_pos_encoding arg

a89b7bf

fix imports formatting

ca94372

take off copied from on vision embeddings

bb67a9f

add test for image embeddings interpolation

498a881

add credit for interpolation code

857b45c

leot13 marked this pull request as ready for review September 7, 2023 13:34

leot13 requested review from ArthurZucker and amyeroberts September 7, 2023 13:34

amyeroberts reviewed Sep 7, 2023

View reviewed changes

leot13 and others added 5 commits September 8, 2023 09:27

Update src/transformers/models/idefics/configuration_idefics.py

8601455

Co-authored-by: amyeroberts <[email protected]>

Update src/transformers/models/idefics/vision.py

52be895

Co-authored-by: amyeroberts <[email protected]>

fix condition to check nbr image patches match shape of pos embeddings

21d0bfa

use kwargs in the forward methods for interpolation

2840e4a

fix tests

4c9391f

leot13 changed the title ~~[WIP] IDEFICS: allow interpolation of vision's pos embeddings~~ IDEFICS: allow interpolation of vision's pos embeddings Sep 8, 2023

have interpolate_pos_encoding default to False instead of None

689e809

amyeroberts reviewed Sep 8, 2023

View reviewed changes

leot13 and others added 5 commits September 11, 2023 15:13

Update tests/models/idefics/test_modeling_idefics.py

1a2d2e5

Co-authored-by: amyeroberts <[email protected]>

Update tests/models/idefics/test_modeling_idefics.py

3400654

Co-authored-by: amyeroberts <[email protected]>

Update tests/models/idefics/test_modeling_idefics.py

07b0bff

Co-authored-by: amyeroberts <[email protected]>

Update src/transformers/models/idefics/configuration_idefics.py

79d4c43

Co-authored-by: amyeroberts <[email protected]>

take off for loop meant to print k,v

bd8a67d

leot13 added 2 commits September 12, 2023 17:41

add interpolate_pos_encoding arg in prepare_inputs_for_generation

2f1d449

add test for interpolated generation

ee2ce91

amyeroberts approved these changes Sep 12, 2023

View reviewed changes

fix edge case num_patches == num_positions and height == width

365c2a2

leot13 added 2 commits September 13, 2023 00:04

add test for edge case

e44649a

fix pos_embed in interpolate

9decae1

leot13 mentioned this pull request Sep 14, 2023

IdeficsProcessor: Changing image_size will result in RuntimeError #26154

Closed

4 tasks

allow interpolation in bf16 with upcasting

776ce7a

leot13 requested review from ArthurZucker and removed request for ArthurZucker September 14, 2023 15:56

ArthurZucker approved these changes Sep 14, 2023

View reviewed changes

leot13 and others added 3 commits September 15, 2023 00:28

Update src/transformers/models/idefics/vision.py

ee98321

Co-authored-by: Arthur <[email protected]>

Update src/transformers/models/idefics/vision.py

7483d5a

Co-authored-by: Arthur <[email protected]>

add multiple images tests for interpolation and generation

82fda57

ArthurZucker merged commit 869733a into huggingface:main Sep 14, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDEFICS: allow interpolation of vision's pos embeddings #26029

IDEFICS: allow interpolation of vision's pos embeddings #26029

leot13 commented Sep 7, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 7, 2023 •

edited

Loading

amyeroberts left a comment

amyeroberts Sep 7, 2023

leot13 Sep 8, 2023

stas00 Sep 8, 2023

leot13 Sep 11, 2023 •

edited

Loading

amyeroberts Sep 7, 2023

leot13 Sep 8, 2023 •

edited

Loading

amyeroberts Sep 7, 2023

amyeroberts Sep 7, 2023

leot13 Sep 8, 2023

amyeroberts left a comment

leot13 commented Sep 11, 2023

amyeroberts left a comment

ArthurZucker left a comment

ArthurZucker Sep 14, 2023

leot13 Sep 14, 2023


		input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size)

		num_images = 2 if self.add_multiple_images else 1

		self.model_tester.create_and_check_model(*config_and_inputs)

		def test_model_with_image_pos_embeddings_interpolation(self):

IDEFICS: allow interpolation of vision's pos embeddings #26029

IDEFICS: allow interpolation of vision's pos embeddings #26029

Conversation

leot13 commented Sep 7, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Sep 7, 2023 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leot13 Sep 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leot13 Sep 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

leot13 commented Sep 11, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leot13 commented Sep 7, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 7, 2023 •

edited

Loading

leot13 Sep 11, 2023 •

edited

Loading

leot13 Sep 8, 2023 •

edited

Loading