Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes clip interpolate #30783

Closed

Conversation

nileshkokane01
Copy link
Contributor

@nileshkokane01 nileshkokane01 commented May 13, 2024

What does this PR do?

solves : #30579

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@amyeroberts

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

A few general comments:

  • interpolate_pos_encoding should be a boolean
  • It should be possible to call the model with this flag i.e. model(**inputs, interpolate_pos_encoding)
  • All the docstrings should be updated to include this argument
  • Print statements should be removed
  • Tests should make sure that the process image being passed to the model is not the default size

tests/models/x_clip/test_modeling_x_clip.py Show resolved Hide resolved
tests/models/altclip/test_modeling_altclip.py Show resolved Hide resolved
@@ -1099,6 +1136,7 @@ def forward(
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
interpolate_pos_encoding: Optional[bool] = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value should be True or False, but not None

Suggested change
interpolate_pos_encoding: Optional[bool] = False,
interpolate_pos_encoding: bool = False,

image_processor = BridgeTowerProcessor.from_pretrained(model_name)

image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
inputs = image_processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

image_processor = ChineseCLIPProcessor.from_pretrained(model_name)

image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
inputs = image_processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ssme here

# to visualize self-attention on higher resolution images.
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").to(torch_device)

image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three comments:

  • This is returning the processor, not the image_processor
  • size should be a dictionary e.g. size={"shortest_edge": 480} here
  • This won't test the interpolation, because the image processor crops after resizing. crop_size also has to be overriden
Suggested change
image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480)
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480)

processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224", padding_side="left")

image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
inputs = processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - parameters affecting output size have to be updated

# to visualize self-attention on higher resolution images.
model = XCLIPModel.from_pretrained("microsoft/xclip-base-patch32").to(torch_device)

image_processor = XCLIPProcessor.from_pretrained("microsoft/xclip-base-patch32", size=480)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

  • returns a processor
  • needs to override crop size
  • size and crop_size should be dicts

@amyeroberts
Copy link
Collaborator

@nileshkokane01 Any update on this PR?

@nileshkokane01
Copy link
Contributor Author

Will do this weekend, I'll let you know if I can't this week or the other.

Copy link

github-actions bot commented Jul 2, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@amyeroberts
Copy link
Collaborator

@nileshkokane01 Any update on this?

@manuelsh
Copy link
Contributor

manuelsh commented Jul 8, 2024

@amyeroberts I can continue working on it.

@manuelsh manuelsh mentioned this pull request Jul 10, 2024
@manuelsh
Copy link
Contributor

Hi @amyeroberts. As I am not the owner of nileshkokane01:interpolate_clip repo, I've done the changes in my forked repo: see PR: #31900

@manuelsh
Copy link
Contributor

Hi @amyeroberts, have you been able to have a look? I can ping someone else if needed. Thanks!

@manuelsh
Copy link
Contributor

I have included the interpolation of positional embeddings in all the following models, (and their respective tests) in #32600 :

  • altclip
  • bridgetower
  • chineseclip
  • clip
  • clipseg
  • kosmos_2
  • x_clip
  • git

Waiting for review.

Thanks!

@amyeroberts
Copy link
Collaborator

@nileshkokane01 Shall we close this with the opening of #32600?

@nileshkokane01
Copy link
Contributor Author

@amyeroberts sure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants