Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interpolation of positional embedding to swin2sr #31024

Closed
wants to merge 2 commits into from

Conversation

M-Ali-ML
Copy link

@M-Ali-ML M-Ali-ML commented May 25, 2024

What does this PR do?

This PR add the interpolate_pos_encoding function to the Swin2SR transformer model, the purpose of it is to allow input images with resolution different than the pretrained resolution.

This is part of the community contribution thread.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Usefulness

Swin2SR transformer by nature supports different input resolution that is due to the case of log-spaced continuous relative position bias which allows it to generalize to higher input resolution at inference time as stated in the paper.
I'd say having interpolate_pos_encoding for Swin2SR is good for code consistency between it and other Visual transformers of the same nature, however its effectiveness is still to be reviewed.
Discussion about Swin family ability to support different resolution was made in this PR #30656.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@amyeroberts

model = Swin2SRForImageSuperResolution.from_pretrained("caidas/swin2SR-classical-sr-x2-64").to(torch_device)

image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
image = image.resize([680, 680]) # size is an unrecognized kwargs in processor
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swin2SRImageProcessor doesn’t have size as kwargs, tbh it is not really needed since the model can take any input resolution.
However, there might be a need to change input size for some reason like having limited hardware resource and want to train on lower resolution images, also to keep consistent code with other image processors.
If that's the case I'm willing to add it as a PR if approved.

@amyeroberts
Copy link
Collaborator

Hi @MightyStud, thanks for opening this PR and apologies for the delay in reviewing. As Swin2SR doesn't require this, then I don't think it makes sense to add to the model and we can close this PR. My bad for having it on the list of models to add on the issue - I'll update this now.

@M-Ali-ML
Copy link
Author

M-Ali-ML commented Jun 8, 2024

@amyeroberts
Understable, I'll keep an eye on the next contribution thread, since I'd like to be part of huggingface contributors one day 😅

@M-Ali-ML M-Ali-ML closed this Jun 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants