-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixes clip interpolate #30783
fixes clip interpolate #30783
Conversation
dd72bd1
to
6ed7b47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
A few general comments:
interpolate_pos_encoding
should be a boolean- It should be possible to call the model with this flag i.e.
model(**inputs, interpolate_pos_encoding)
- All the docstrings should be updated to include this argument
- Print statements should be removed
- Tests should make sure that the process image being passed to the model is not the default size
@@ -1099,6 +1136,7 @@ def forward( | |||
output_attentions: Optional[bool] = None, | |||
output_hidden_states: Optional[bool] = None, | |||
return_dict: Optional[bool] = None, | |||
interpolate_pos_encoding: Optional[bool] = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value should be True or False, but not None
interpolate_pos_encoding: Optional[bool] = False, | |
interpolate_pos_encoding: bool = False, |
image_processor = BridgeTowerProcessor.from_pretrained(model_name) | ||
|
||
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png") | ||
inputs = image_processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
image_processor = ChineseCLIPProcessor.from_pretrained(model_name) | ||
|
||
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png") | ||
inputs = image_processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ssme here
# to visualize self-attention on higher resolution images. | ||
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").to(torch_device) | ||
|
||
image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three comments:
- This is returning the processor, not the image_processor
size
should be a dictionary e.g.size={"shortest_edge": 480}
here- This won't test the interpolation, because the image processor crops after resizing.
crop_size
also has to be overriden
image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480) | |
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480) |
processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224", padding_side="left") | ||
|
||
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png") | ||
inputs = processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - parameters affecting output size have to be updated
# to visualize self-attention on higher resolution images. | ||
model = XCLIPModel.from_pretrained("microsoft/xclip-base-patch32").to(torch_device) | ||
|
||
image_processor = XCLIPProcessor.from_pretrained("microsoft/xclip-base-patch32", size=480) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here:
- returns a processor
- needs to override crop size
- size and crop_size should be dicts
@nileshkokane01 Any update on this PR? |
Will do this weekend, I'll let you know if I can't this week or the other. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@nileshkokane01 Any update on this? |
@amyeroberts I can continue working on it. |
Hi @amyeroberts. As I am not the owner of nileshkokane01:interpolate_clip repo, I've done the changes in my forked repo: see PR: #31900 |
Hi @amyeroberts, have you been able to have a look? I can ping someone else if needed. Thanks! |
I have included the interpolation of positional embeddings in all the following models, (and their respective tests) in #32600 :
Waiting for review. Thanks! |
@nileshkokane01 Shall we close this with the opening of #32600? |
@amyeroberts sure! |
What does this PR do?
solves : #30579
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@amyeroberts
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.