-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable dynamic resolution input for Beit #31053
Enable dynamic resolution input for Beit #31053
Conversation
Hi @amyeroberts This PR is incomplete right now because I am unsure how to proceed. It seems that the BeitEncoder takes the original patch embeddings configuration as an input and hence the original window size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding! Just a few small comments
with self.assertRaises(ValueError, msg="doesn't match model"): | ||
model(pixel_values, interpolate_pos_encoding=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure this still holds if anything happens upstream and to make things explicit, could you add the following above:
self.assertFalse(processor.do_center_crop)
@OmarManzoor Regarding the relative position bias, looking at the modeling code, I think whenever the output of this model is used, then it will also need to be interpolated if |
Could you kindly clarify a bit where exactly this should be added? Do we need to add a new interpolation function that works for BeitSelfAttention? |
@OmarManzoor You need to make sure that the relative position biases are interpolated wherever that is needed. This might be as an argument to the relative position class, or within the modules that use its output |
@amyeroberts How do I calculate the interpolations for the relative position biases similar to how we calculated them for the embeddings? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - thanks for adding this!
Only a small nit on the docstring for data2vec
@@ -670,6 +753,7 @@ def forward( | |||
head_mask: Optional[torch.Tensor] = None, | |||
output_attentions: Optional[bool] = None, | |||
output_hidden_states: Optional[bool] = None, | |||
interpolate_pos_encoding: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be added to DATA2VEC_VISION_INPUTS_DOCSTRING
Thanks again for adding this! |
* Initial attempt * Updates: PR suggestions * Interpolate the relative position bias when interpolate_pos_encoding is True * Add slow tag for the added tests * Add in DATA2VEC_VISION_INPUTS_DOCSTRING
* Initial attempt * Updates: PR suggestions * Interpolate the relative position bias when interpolate_pos_encoding is True * Add slow tag for the added tests * Add in DATA2VEC_VISION_INPUTS_DOCSTRING
What does this PR do?
Towards #30579
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
CC: @amyeroberts
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.