Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDEFICS: allow interpolation of vision's pos embeddings #26029

Merged

Conversation

leot13
Copy link
Contributor

@leot13 leot13 commented Sep 7, 2023

What does this PR do?

Allows vision position embeddings to be interpolated. Thus allowing bigger images to be passed to the model.
Fixes issue #26154

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker @amyeroberts

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 7, 2023

The documentation is not available anymore as the PR was closed or merged.

@leot13 leot13 marked this pull request as ready for review September 7, 2023 13:34
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

Few small comments - mostly structure and styling suggestions to be in line with other models

src/transformers/models/idefics/configuration_idefics.py Outdated Show resolved Hide resolved
@@ -169,6 +169,33 @@ def prepare_config_and_inputs(self):

return (config, input_ids, input_mask, pixel_values, image_attention_mask)

def prepare_config_and_inputs_for_image_pos_embeddings_interpolation(self):
self.seq_length = 42
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original seq_length should be reverted to after running the test: each test should be independent and not rely on changing state of other tests.

Out of interest, why change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reused the prepare_config_and_inputs() method to make the test for interpolation, but I am not sure why the seq_len should be changed. @stas00 probably has a better idea as to why it is here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be move the setting of it to:

so it's the same everywhere unless it's not

Copy link
Contributor Author

@leot13 leot13 Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yes, this is using this parameter now.


input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size)

num_images = 2 if self.add_multiple_images else 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - this means another test can modify the state and a different test is run. There should be two new tests added: one which tests with on image and one which tests with two

Copy link
Contributor Author

@leot13 leot13 Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the tests according to your comments. Mainly, I took off the self.add_multiple_images / self.num_images, took off the change of self.seq_len, and made 3 different tests instead of 2 previously.

mode="bicubic",
align_corners=False,
)
assert int(h0) == patch_pos_embed.shape[-2] and int(w0) == patch_pos_embed.shape[-1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have asserts added in the modeling files - this should be removed or made into an exception.

(if it was my own code - I'd have it as an assert, but this is the standard in transformers)

src/transformers/models/idefics/vision.py Outdated Show resolved Hide resolved
@@ -380,6 +425,7 @@ class IdeficsVisionTransformer(nn.Module):
def __init__(self, config: IdeficsVisionConfig):
super().__init__()
self.config = config
self.interpolate_pos_encoding = config.interpolate_pos_encoding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To align with other models, this should be a kwarg in the forward method, rather than set by the config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed that makes more sense

@leot13 leot13 changed the title [WIP] IDEFICS: allow interpolation of vision's pos embeddings IDEFICS: allow interpolation of vision's pos embeddings Sep 8, 2023
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this!

Just some small tidy-ups: removing param name in config, removing print statements, correct setting for a test.

tests/models/idefics/test_modeling_idefics.py Outdated Show resolved Hide resolved
tests/models/idefics/test_modeling_idefics.py Outdated Show resolved Hide resolved
tests/models/idefics/test_modeling_idefics.py Outdated Show resolved Hide resolved
src/transformers/models/idefics/configuration_idefics.py Outdated Show resolved Hide resolved
@leot13
Copy link
Contributor Author

leot13 commented Sep 11, 2023

Thanks for all the comments and suggestions! They should all be answered now

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding!

@leot13 leot13 requested review from ArthurZucker and removed request for ArthurZucker September 14, 2023 15:56
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me left a few nits

src/transformers/models/idefics/vision.py Outdated Show resolved Hide resolved
src/transformers/models/idefics/vision.py Outdated Show resolved Hide resolved
self.model_tester.create_and_check_model(*config_and_inputs)

def test_model_with_image_pos_embeddings_interpolation(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interpolate is never tested with 2 images is this expected to not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the opposite. I added the multiple image tests for interpolation and generation

@ArthurZucker ArthurZucker merged commit 869733a into huggingface:main Sep 14, 2023
3 checks passed
parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
…26029)

* add pos embed interpolation for vision encoder

* style

* update config with interpolate_pos_encoding arg

* fix imports formatting

* take off copied from on vision embeddings

* add test for image embeddings interpolation

* add credit for interpolation code

* Update src/transformers/models/idefics/configuration_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/idefics/vision.py

Co-authored-by: amyeroberts <[email protected]>

* fix condition to check nbr image patches match shape of pos embeddings

* use kwargs in the forward methods for interpolation

* fix tests

* have interpolate_pos_encoding default to False instead of None

* Update tests/models/idefics/test_modeling_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* Update tests/models/idefics/test_modeling_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* Update tests/models/idefics/test_modeling_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/idefics/configuration_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* take off for loop meant to print k,v

* add interpolate_pos_encoding arg in prepare_inputs_for_generation

* add test for interpolated generation

* fix edge case num_patches == num_positions and height == width

* add test for edge case

* fix pos_embed in interpolate

* allow interpolation in bf16 with upcasting

* Update src/transformers/models/idefics/vision.py

Co-authored-by: Arthur <[email protected]>

* Update src/transformers/models/idefics/vision.py

Co-authored-by: Arthur <[email protected]>

* add multiple images tests for interpolation and generation

---------

Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Arthur <[email protected]>
blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023
…26029)

* add pos embed interpolation for vision encoder

* style

* update config with interpolate_pos_encoding arg

* fix imports formatting

* take off copied from on vision embeddings

* add test for image embeddings interpolation

* add credit for interpolation code

* Update src/transformers/models/idefics/configuration_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/idefics/vision.py

Co-authored-by: amyeroberts <[email protected]>

* fix condition to check nbr image patches match shape of pos embeddings

* use kwargs in the forward methods for interpolation

* fix tests

* have interpolate_pos_encoding default to False instead of None

* Update tests/models/idefics/test_modeling_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* Update tests/models/idefics/test_modeling_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* Update tests/models/idefics/test_modeling_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/idefics/configuration_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* take off for loop meant to print k,v

* add interpolate_pos_encoding arg in prepare_inputs_for_generation

* add test for interpolated generation

* fix edge case num_patches == num_positions and height == width

* add test for edge case

* fix pos_embed in interpolate

* allow interpolation in bf16 with upcasting

* Update src/transformers/models/idefics/vision.py

Co-authored-by: Arthur <[email protected]>

* Update src/transformers/models/idefics/vision.py

Co-authored-by: Arthur <[email protected]>

* add multiple images tests for interpolation and generation

---------

Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Arthur <[email protected]>
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 18, 2023
…26029)

* add pos embed interpolation for vision encoder

* style

* update config with interpolate_pos_encoding arg

* fix imports formatting

* take off copied from on vision embeddings

* add test for image embeddings interpolation

* add credit for interpolation code

* Update src/transformers/models/idefics/configuration_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/idefics/vision.py

Co-authored-by: amyeroberts <[email protected]>

* fix condition to check nbr image patches match shape of pos embeddings

* use kwargs in the forward methods for interpolation

* fix tests

* have interpolate_pos_encoding default to False instead of None

* Update tests/models/idefics/test_modeling_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* Update tests/models/idefics/test_modeling_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* Update tests/models/idefics/test_modeling_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/idefics/configuration_idefics.py

Co-authored-by: amyeroberts <[email protected]>

* take off for loop meant to print k,v

* add interpolate_pos_encoding arg in prepare_inputs_for_generation

* add test for interpolated generation

* fix edge case num_patches == num_positions and height == width

* add test for edge case

* fix pos_embed in interpolate

* allow interpolation in bf16 with upcasting

* Update src/transformers/models/idefics/vision.py

Co-authored-by: Arthur <[email protected]>

* Update src/transformers/models/idefics/vision.py

Co-authored-by: Arthur <[email protected]>

* add multiple images tests for interpolation and generation

---------

Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Arthur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants