Add interpolation of positional embedding to swin2sr #31024

M-Ali-ML · 2024-05-25T10:49:59Z

What does this PR do?

This PR add the interpolate_pos_encoding function to the Swin2SR transformer model, the purpose of it is to allow input images with resolution different than the pretrained resolution.

This is part of the community contribution thread.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Usefulness

Swin2SR transformer by nature supports different input resolution that is due to the case of log-spaced continuous relative position bias which allows it to generalize to higher input resolution at inference time as stated in the paper.
I'd say having interpolate_pos_encoding for Swin2SR is good for code consistency between it and other Visual transformers of the same nature, however its effectiveness is still to be reviewed.
Discussion about Swin family ability to support different resolution was made in this PR #30656.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@amyeroberts

M-Ali-ML · 2024-05-25T10:55:16Z

tests/models/swin2sr/test_modeling_swin2sr.py

+        model = Swin2SRForImageSuperResolution.from_pretrained("caidas/swin2SR-classical-sr-x2-64").to(torch_device)
+
+        image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
+        image = image.resize([680, 680])  # size is an unrecognized kwargs in processor


Swin2SRImageProcessor doesn’t have size as kwargs, tbh it is not really needed since the model can take any input resolution.
However, there might be a need to change input size for some reason like having limited hardware resource and want to train on lower resolution images, also to keep consistent code with other image processors.
If that's the case I'm willing to add it as a PR if approved.

amyeroberts · 2024-06-07T17:40:07Z

Hi @MightyStud, thanks for opening this PR and apologies for the delay in reviewing. As Swin2SR doesn't require this, then I don't think it makes sense to add to the model and we can close this PR. My bad for having it on the list of models to add on the issue - I'll update this now.

M-Ali-ML · 2024-06-08T16:24:22Z

@amyeroberts
Understable, I'll keep an eye on the next contribution thread, since I'd like to be part of huggingface contributors one day 😅

M-Ali-ML and others added 2 commits May 25, 2024 12:29

add interpolation of positional embeddings for swin2sr 📈

e4bff74

Merge branch 'huggingface:main' into add-dynamic-input-swin2sr

70a22b1

M-Ali-ML commented May 25, 2024

View reviewed changes

M-Ali-ML closed this Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add interpolation of positional embedding to swin2sr #31024

Add interpolation of positional embedding to swin2sr #31024

M-Ali-ML commented May 25, 2024 •

edited

Loading

M-Ali-ML May 25, 2024

amyeroberts commented Jun 7, 2024

M-Ali-ML commented Jun 8, 2024

Add interpolation of positional embedding to swin2sr #31024

Add interpolation of positional embedding to swin2sr #31024

Conversation

M-Ali-ML commented May 25, 2024 • edited Loading

What does this PR do?

Before submitting

Usefulness

Who can review?

M-Ali-ML May 25, 2024

Choose a reason for hiding this comment

amyeroberts commented Jun 7, 2024

M-Ali-ML commented Jun 8, 2024

M-Ali-ML commented May 25, 2024 •

edited

Loading