-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IDEFICS: allow interpolation of vision's pos embeddings #26029
IDEFICS: allow interpolation of vision's pos embeddings #26029
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
Few small comments - mostly structure and styling suggestions to be in line with other models
@@ -169,6 +169,33 @@ def prepare_config_and_inputs(self): | |||
|
|||
return (config, input_ids, input_mask, pixel_values, image_attention_mask) | |||
|
|||
def prepare_config_and_inputs_for_image_pos_embeddings_interpolation(self): | |||
self.seq_length = 42 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original seq_length
should be reverted to after running the test: each test should be independent and not rely on changing state of other tests.
Out of interest, why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reused the prepare_config_and_inputs() method to make the test for interpolation, but I am not sure why the seq_len should be changed. @stas00 probably has a better idea as to why it is here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be move the setting of it to:
seq_length=7, |
so it's the same everywhere unless it's not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yes, this is using this parameter now.
|
||
input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size) | ||
|
||
num_images = 2 if self.add_multiple_images else 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - this means another test can modify the state and a different test is run. There should be two new tests added: one which tests with on image and one which tests with two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the tests according to your comments. Mainly, I took off the self.add_multiple_images / self.num_images, took off the change of self.seq_len, and made 3 different tests instead of 2 previously.
mode="bicubic", | ||
align_corners=False, | ||
) | ||
assert int(h0) == patch_pos_embed.shape[-2] and int(w0) == patch_pos_embed.shape[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have asserts added in the modeling files - this should be removed or made into an exception.
(if it was my own code - I'd have it as an assert, but this is the standard in transformers)
@@ -380,6 +425,7 @@ class IdeficsVisionTransformer(nn.Module): | |||
def __init__(self, config: IdeficsVisionConfig): | |||
super().__init__() | |||
self.config = config | |||
self.interpolate_pos_encoding = config.interpolate_pos_encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To align with other models, this should be a kwarg in the forward method, rather than set by the config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed that makes more sense
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this!
Just some small tidy-ups: removing param name in config, removing print statements, correct setting for a test.
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Thanks for all the comments and suggestions! They should all be answered now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me left a few nits
self.model_tester.create_and_check_model(*config_and_inputs) | ||
|
||
def test_model_with_image_pos_embeddings_interpolation(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interpolate is never tested with 2 images is this expected to not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the opposite. I added the multiple image tests for interpolation and generation
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Arthur <[email protected]>
…26029) * add pos embed interpolation for vision encoder * style * update config with interpolate_pos_encoding arg * fix imports formatting * take off copied from on vision embeddings * add test for image embeddings interpolation * add credit for interpolation code * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/idefics/vision.py Co-authored-by: amyeroberts <[email protected]> * fix condition to check nbr image patches match shape of pos embeddings * use kwargs in the forward methods for interpolation * fix tests * have interpolate_pos_encoding default to False instead of None * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <[email protected]> * take off for loop meant to print k,v * add interpolate_pos_encoding arg in prepare_inputs_for_generation * add test for interpolated generation * fix edge case num_patches == num_positions and height == width * add test for edge case * fix pos_embed in interpolate * allow interpolation in bf16 with upcasting * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <[email protected]> * add multiple images tests for interpolation and generation --------- Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Arthur <[email protected]>
…26029) * add pos embed interpolation for vision encoder * style * update config with interpolate_pos_encoding arg * fix imports formatting * take off copied from on vision embeddings * add test for image embeddings interpolation * add credit for interpolation code * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/idefics/vision.py Co-authored-by: amyeroberts <[email protected]> * fix condition to check nbr image patches match shape of pos embeddings * use kwargs in the forward methods for interpolation * fix tests * have interpolate_pos_encoding default to False instead of None * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <[email protected]> * take off for loop meant to print k,v * add interpolate_pos_encoding arg in prepare_inputs_for_generation * add test for interpolated generation * fix edge case num_patches == num_positions and height == width * add test for edge case * fix pos_embed in interpolate * allow interpolation in bf16 with upcasting * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <[email protected]> * add multiple images tests for interpolation and generation --------- Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Arthur <[email protected]>
…26029) * add pos embed interpolation for vision encoder * style * update config with interpolate_pos_encoding arg * fix imports formatting * take off copied from on vision embeddings * add test for image embeddings interpolation * add credit for interpolation code * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/idefics/vision.py Co-authored-by: amyeroberts <[email protected]> * fix condition to check nbr image patches match shape of pos embeddings * use kwargs in the forward methods for interpolation * fix tests * have interpolate_pos_encoding default to False instead of None * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <[email protected]> * take off for loop meant to print k,v * add interpolate_pos_encoding arg in prepare_inputs_for_generation * add test for interpolated generation * fix edge case num_patches == num_positions and height == width * add test for edge case * fix pos_embed in interpolate * allow interpolation in bf16 with upcasting * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <[email protected]> * add multiple images tests for interpolation and generation --------- Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Arthur <[email protected]>
What does this PR do?
Allows vision position embeddings to be interpolated. Thus allowing bigger images to be passed to the model.
Fixes issue #26154
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker @amyeroberts