Add propainter #33217

RUFFY-369 · 2024-08-30T14:03:33Z

What does this PR do?

This PR adds ProPainter, a Video Inpainting model with 5.4k stars and 635 forks repo. It fixes #26360 and resolve stale PR #26391 for the above issue from complete scratch to build on with transformers standard.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@amyeroberts @ArthurZucker @NielsRogge (?)
@rafaelpadilla(as he was the initial reviewer on the stale PR)

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

The PR is more than ready for first pass of review!!!

TODO(will be done in a fly :)):

Fix all common test failures
Update weights conversion scripts with the working one on local machine
Review batching nits one more time in the applicable files
Update docs in corresponding files
Check for video 'outpainting' error

Results:

Here, I am attaching the GIFs for original video, original model's output for object removal through video inpainting and the current PR' HF model's output for object removal through video inpainting:

Original video:

Original model output:

HF ported model output:

Example usage is provided in the doc file here

RUFFY-369 · 2024-10-08T09:35:27Z

Hi @RUFFY-369 , I added a couple comments - overall it looks like there's a lot of tricks to keep the memory usage low which is good ! Now they need to be more inline with the lib standards :) relative to processing, we are also uniformizing the way models process inputs and I left a few additional comments on that. LMK what you think and I'll iterate on the review!

Hi @molbap, I have addressed all the suggested changes, uniformized kwargs and all the stuff that you mentioned. I have put my questions and view about specific things in your review as well.

Please check the changes out and iterate on the review and please let me know any more further changes that has to be made.

Thank you 😄

cc @amyeroberts

RUFFY-369 · 2024-10-08T09:38:54Z

All tests are green

RUFFY-369 · 2024-10-11T11:51:48Z

soft ping @molbap
Thank you 😄

molbap · 2024-10-11T16:37:28Z

On my radar - I'll review it as soon as I can!

molbap

Hey @RUFFY-369 thanks for all the work here 🚀 I did a first pass to cover a few things that were not transformers-compatible, similar to the previous PR :) I have yet to cover processing and tests, but have a couple questions and some of my comments re: naming are applicable to the complete modeling file even though I haven't commented all of it. Ping me back when you've had time to include these changes!

molbap · 2024-10-14T13:58:24Z

src/transformers/models/propainter/configuration_propainter.py

+            The size of the sliding window for attention operations.
+        pool_size (`List[int]`, *optional*, defaults to `[4, 4]`):
+            The size of the pooling layers in the model.
+        no_dis (`bool`, *optional*, defaults to `False`):


nit on variable naming, avoid abbreviations no_dis --> no_discriminator
would also be better to have it a positive action use_discriminator defaulting to True

done in the recent commits

molbap · 2024-10-14T14:09:50Z

src/transformers/models/propainter/configuration_propainter.py

+        stride=[3, 3],
+        stride_3d=[1, 1, 1],
+        num_hidden_layers=8,
+        num_attention_heads=4,
+        window_size=[5, 9],
+        pool_size=[4, 4],
+        no_dis=False,
+        in_channels=[64, 64, 96],
+        channels=[64, 96, 128],
+        strides=[1, 2, 2],


stride, stride_3d and strides are very similar -I'd advise going with different kwargs to remove some ambiguity, it's fine to use longer kwargs

done in the recent commits

molbap · 2024-10-14T14:19:22Z

src/transformers/models/propainter/convert_propainter_to_hf.py

+    if args.verify_logits:
+        video, masks = prepare_input()
+        image_processor = ProPainterVideoProcessor()
+        inputs = image_processor(video, masks=masks, return_tensors="pt").to(device)
+        outputs = model(**inputs)
+        outputs_reconstruction = outputs.reconstruction
+
+        assert torch.allclose(
+            torch.tensor(outputs_reconstruction[0][0][-3:]),
+            expected_output_reconstruction,
+            atol=1e-4,
+        )
+        print("Looks good!")


Let's move this part to testing, no need to have it around the conversion file

Removed from weight conversion file and as per test_modeling file, this assertion is already there 👍

molbap · 2024-10-14T14:22:29Z

src/transformers/models/propainter/convert_propainter_to_hf.py

+def rename_flow_completion(old_key, network_mapping):
+    new_key = ""
+    for old_prefix, new_prefix in network_mapping.items():
+        if old_prefix in old_key:
+            new_key = old_key.replace(f"{old_prefix}", f"{new_prefix}")
+            # Handle specific layer/block transformations
+            if "mid_dilation" in new_key:
+                new_key = new_key.replace("mid_dilation", "intermediate_dilation")
+            if "feat_prop_module" in new_key:
+                new_key = new_key.replace("feat_prop_module", "feature_propagation_module")
+            if "edgeDetector.mid_layer" in new_key:
+                new_key = new_key.replace("edgeDetector.mid_layer", "edgeDetector.intermediate_layer")
+
+    return new_key


Across the file, key renames can be handled by regexes here to avoid some if/else slightly harder to read logic, and also to see at a glance in one place which key in the original model corresponds to what key in transformers

done in the recent commits

molbap · 2024-10-14T14:25:30Z

src/transformers/models/propainter/convert_propainter_to_hf.py

+        "encoder1": "flow_completion_net.encoder1",
+        "encoder2": "flow_completion_net.encoder2",
+        "decoder1": "flow_completion_net.decoder1",
+        "decoder2": "flow_completion_net.decoder2",


rel to the regex comment, this could be for instance

network_mapping_completion = { r"(downsample|encoder1|encoder2|decoder1|decoder2|upsample)": r"flow_completion_net.\1", ... }

molbap · 2024-10-14T15:19:30Z

src/transformers/models/propainter/modeling_propainter.py

+
+        downsample_inputs = self.downsample(inputs)
+
+        features_enc1 = self.encoder1(downsample_inputs)


nits:
naming, separate numbers from names + avoid abbreviations + remove comments in all the forward method

molbap · 2024-10-14T15:19:58Z

src/transformers/models/propainter/modeling_propainter.py

+
+        return flow, edge
+
+    def forward_bidirect_flow(self, masked_flows_bi, masks):


naming, no abbreviations

molbap · 2024-10-14T15:20:23Z

src/transformers/models/propainter/modeling_propainter.py

+        super().__init__()
+        self.config = config
+        self.group = [1, 2, 4, 8, 1]
+        negative_slope = 0.2


that's good! let's add it to the config too

molbap · 2024-10-14T15:21:41Z

src/transformers/models/propainter/modeling_propainter.py

+            if i == 8:
+                x0 = features
+                _, _, height, width = x0.size()
+            if i > 8 and i % 2 == 0:


maybe add a small comment here 😅 might not be intuitive to understand why there is this treatment of odd/even layers except at layer 8

src/transformers/models/propainter/modeling_propainter.py

Co-authored-by: Pablo Montalvo <[email protected]>

…to add_propainter

…nts in forward

…ow completion net

RUFFY-369 · 2024-10-22T12:11:18Z

Hey @RUFFY-369 thanks for all the work here 🚀 I did a first pass to cover a few things that were not transformers-compatible, similar to the previous PR :) I have yet to cover processing and tests, but have a couple questions and some of my comments re: naming are applicable to the complete modeling file even though I haven't commented all of it. Ping me back when you've had time to include these changes!

Hi @molbap I have addressed all your comments and left queries as well. Please review the rest of the remaining files and please iterate on the ones that are addressed when you get the time.

Thank you 😄

RUFFY-369 · 2024-10-31T04:19:26Z

@molbap Soft ping
Thanks 😄

molbap

Hey @RUFFY-369 ! I did a pass on naming and configs 🧹 because I identified persisting issues there, considering the very very hefty size of the modeling code - 4600 loc is indeed hefty - I think clarity is paramount to help code inspectors understand the logic flow. Let me know when you think you've addressed the suggestions on all the existing code, and I'll continue promptly 🤗
Also I'd suggest taking a look at currently failing tests, it's mostly docstring mismatches in the configs I believe

molbap · 2024-10-31T09:23:02Z

src/transformers/models/propainter/modeling_propainter.py

+
+        self.conv1 = nn.Conv2d(in_channels, channels, kernel_size=3, padding=1, stride=stride)
+        self.conv2 = nn.Conv2d(channels, channels, kernel_size=3, padding=1)
+        self.relu = nn.ReLU(inplace=True)


well fair enough, maybe just add a comment there saying so since it's a rarely seen pattern

molbap · 2024-10-31T09:30:09Z

src/transformers/models/propainter/modeling_propainter.py

+        # using itertools makes flattening a little faster :)
+        self.resblocks = nn.ModuleList(list(itertools.chain.from_iterable(self.resblocks)))


Interesting - do you know what difference does it make in numbers?

molbap · 2024-10-31T09:34:25Z

src/transformers/models/propainter/modeling_propainter.py

+        self.conv_corr1 = nn.Conv2d(correlation_planes, config.num_channels * 2, 1, padding=0)
+        self.conv_corr2 = nn.Conv2d(config.num_channels * 2, 192, config.patch_size, padding=config.padding)
+        self.conv_flow1 = nn.Conv2d(2, config.num_channels, config.kernel_size[0], padding=3)
+        self.conv_flow2 = nn.Conv2d(
+            config.num_channels,
+            config.in_channels[0],
+            config.patch_size,
+            padding=config.padding,
+        )
+        self.conv = nn.Conv2d(
+            config.in_channels[0] + 192,
+            config.num_channels - 2,
+            config.patch_size,
+            padding=config.padding,
+        )
+
+    def forward(self, optical_flow, correlation):
+        hidden_states_correlation = F.relu(self.conv_corr1(correlation))
+        hidden_states_correlation = F.relu(self.conv_corr2(hidden_states_correlation))
+        hidden_states_flow = F.relu(self.conv_flow1(optical_flow))
+        hidden_states_flow = F.relu(self.conv_flow2(hidden_states_flow))
+
+        hidden_states = torch.cat([hidden_states_correlation, hidden_states_flow], dim=1)
+        hidden_states = F.relu(self.conv(hidden_states))
+        hidden_states = torch.cat([hidden_states, optical_flow], dim=1)
+
+        return hidden_states


nits on naming, the comeback: when possible, avoid abbreviations and try to space numbers and letters for legibility.
conv_corr_1 is better than conv_corr1, and so on.

molbap · 2024-10-31T09:39:10Z

src/transformers/models/propainter/modeling_propainter.py

+        self.convz1 = nn.Conv2d(hidden_dim + input_dim, hidden_dim, (1, 5), padding=(0, 2))
+        self.convr1 = nn.Conv2d(hidden_dim + input_dim, hidden_dim, (1, 5), padding=(0, 2))
+        self.convq1 = nn.Conv2d(hidden_dim + input_dim, hidden_dim, (1, 5), padding=(0, 2))
+
+        self.convz2 = nn.Conv2d(hidden_dim + input_dim, hidden_dim, (5, 1), padding=(2, 0))
+        self.convr2 = nn.Conv2d(hidden_dim + input_dim, hidden_dim, (5, 1), padding=(2, 0))
+        self.convq2 = nn.Conv2d(hidden_dim + input_dim, hidden_dim, (5, 1), padding=(2, 0))
+
+    def forward(self, hidden_states, motion_features):
+        hidden_states_motion_features = torch.cat([hidden_states, motion_features], dim=1)
+        z = torch.sigmoid(self.convz1(hidden_states_motion_features))
+        r = torch.sigmoid(self.convr1(hidden_states_motion_features))
+        q = torch.tanh(self.convq1(torch.cat([r * hidden_states, motion_features], dim=1)))
+        hidden_states = (1 - z) * hidden_states + z * q
+        hidden_states_motion_features = torch.cat([hidden_states, motion_features], dim=1)
+        z = torch.sigmoid(self.convz2(hidden_states_motion_features))
+        r = torch.sigmoid(self.convr2(hidden_states_motion_features))
+        q = torch.tanh(self.convq2(torch.cat([r * hidden_states, motion_features], dim=1)))
+        hidden_states = (1 - z) * hidden_states + z * q


nits on naming, book 3, the return of the nit:

avoid single-letter variables

space letters and numbers

avoid abbreviations, we have enough space to afford some more letters
I understand this adds up chores to the PR, but it's necessary for the legibility/harmony of the 200 models in the codebase! 🙇

molbap · 2024-10-31T09:49:32Z

src/transformers/models/propainter/modeling_propainter.py

+        hidden_dim: int = 128,
+        input_dim: int = 192 + 128,


two things here:

why is the default input_dim defined like this?

ideally modules should only be inititalized with a configuration and a layer index - avoids hardcoding too many things.

In addition, ProPainterBasicUpdateBlock is initialized with hidden_dim only, meaning that the other argument could simply be contained in the configuration.

molbap · 2024-10-31T10:19:09Z

src/transformers/models/propainter/modeling_propainter.py

+        config: ProPainterConfig,
+        in_channel: int = 2,
+        out_channel: int = 1,
+        intermediate_channel: int = 16,
+    ):


same remark, we can move these values to the config and simply initialize the module with the configuration passed, with explicitly named keys

molbap · 2024-10-31T10:27:00Z

src/transformers/models/propainter/modeling_propainter.py

+                feat = (
+                    [feat_current]
+                    + [features[k][frame_id] for k in features if k not in ["spatial", module_name]]
+                    + [feature_propagation]
+                )
+
+                feat = torch.cat(feat, dim=1)
+                feature_propagation = feature_propagation + self.backbone[module_name](feat)


Suggested change

feat = (

[feat_current]

+ [features[k][frame_id] for k in features if k not in ["spatial", module_name]]

+ [feature_propagation]

)

feat = torch.cat(feat, dim=1)

feature_propagation = feature_propagation + self.backbone[module_name](feat)

aggregated_features = (

[feat_current]

+ [features[k][frame_id] for k in features if k not in ["spatial", module_name]]

+ [feature_propagation]

)

aggregated_features = torch.cat(aggregated_features, dim=1)

feature_propagation = feature_propagation + self.backbone[module_name](aggregated_features)

molbap · 2024-10-31T10:43:03Z

src/transformers/models/propainter/modeling_propainter.py

+        pooling_token: bool = True,
+    ):
+        super().__init__()
+        assert hidden_size % num_attention_heads == 0


use raise rather than assert

molbap · 2024-10-31T10:43:40Z

src/transformers/models/propainter/modeling_propainter.py

+            mask_tl = torch.ones(self.window_size[0], self.window_size[1])
+            mask_tl[: -self.expand_size[0], : -self.expand_size[1]] = 0
+            mask_tr = torch.ones(self.window_size[0], self.window_size[1])
+            mask_tr[: -self.expand_size[0], self.expand_size[1] :] = 0
+            mask_bl = torch.ones(self.window_size[0], self.window_size[1])
+            mask_bl[self.expand_size[0] :, : -self.expand_size[1]] = 0
+            mask_br = torch.ones(self.window_size[0], self.window_size[1])
+            mask_br[self.expand_size[0] :, self.expand_size[1] :] = 0
+            masked_rolled_key = torch.stack((mask_tl, mask_tr, mask_bl, mask_br), 0).flatten(0)
+            self.register_buffer("valid_ind_rolled", masked_rolled_key.nonzero(as_tuple=False).view(-1))


improve naming here too

molbap · 2024-10-31T10:45:05Z

src/transformers/models/propainter/modeling_propainter.py

+
+        self.max_pool = nn.MaxPool2d(window_size, window_size, (0, 0))
+
+    def forward(


considering the mighty size of this forward it's acceptable to cut it down into 2 sub-methods at least, for each branching

RUFFY-369 added 30 commits August 5, 2024 22:17

chore:initial implementation of propainter in pytorch

9da88d1

Merge remote-tracking branch 'upstream/main' into add_propainter

ef08969

chore:add necessary modules with std class names

6bbc156

Merge remote-tracking branch 'upstream/main' into add_propainter

5305d26

chore:add configuration from the original code

5aca439

Merge remote-tracking branch 'upstream/main' into add_propainter

aac3011

fix:bug in rough model and config import

dd5ce3d

chore:make modeling file ready for structur test

ea5fbdd

Merge remote-tracking branch 'upstream/main' into add_propainter

6c9fdd7

chore:changes to make loading .bin model from pretrained work

baf7f5d

fix:bug in first pretrained model rough hf forward pass

d2ec818

chore:make outputs of ported model match with org to 1e-4

d28eb27

refactor:make transformers compliant

b41b47f

Merge remote-tracking branch 'upstream/main' into add_propainter

0694f62

refactor:make transformers compliant

87a8100

Merge remote-tracking branch 'upstream/main' into add_propainter

e655222

fix:model weights init

c96df6b

Merge remote-tracking branch 'upstream/main' into add_propainter

bf86054

chore:clean the code

35717f5

chore:add loss in model output

ffaadfe

Merge remote-tracking branch 'upstream/main' into add_propainter

8d9262b

fix:bug for calculating gan loss error

4e768ae

Merge remote-tracking branch 'upstream/main' into add_propainter

c48b9b5

fix:bug in calculating edge loss

b22e35f

refactor: make transformers compliant and few nits

6229526

Merge remote-tracking branch 'upstream/main' into add_propainter

490e869

chore:add hidden states to model output

29416d7

Merge remote-tracking branch 'upstream/main' into add_propainter

41428a2

chore:add attentions to model output and few related nits

815241b

Merge remote-tracking branch 'upstream/main' into add_propainter

883805d

molbap self-requested a review October 8, 2024 10:11

This was referenced Oct 8, 2024

Outpainting Pipeline Implementation on ai-runner Bounty [$850] livepeer/bounties#56

Closed

Inpainting Pipeline Implementation on ai-runner Bounty [$850] livepeer/bounties#55

Closed

Merge remote-tracking branch 'upstream/main' into add_propainter

249bd9f

molbap reviewed Oct 14, 2024

View reviewed changes

RUFFY-369 and others added 17 commits October 17, 2024 20:30

Merge remote-tracking branch 'upstream/main' into add_propainter

d1fbbb7

chore:nit suggested changes

bbb3948

chore:nit name changes for config attributes

707388b

chore:remove redundant test from weight conversion file

56bc692

chore:add regex to weight conversion file as suggested

db881b6

Merge remote-tracking branch 'upstream/main' into add_propainter

4092c21

chore: apply suggested changes

3ea5779

Co-authored-by: Pablo Montalvo <[email protected]>

chore: apply suggested changes

3878df8

Co-authored-by: Pablo Montalvo <[email protected]>

Merge branch 'add_propainter' of github.com:RUFFY-369/transformers in…

cf3408b

…to add_propainter

style:make style

8438666

chore:add suggested changes regarding naming and method's docstring

de91e59

chore:add more suggested changes regarding config file and nits

f19247a

chore:add suggested changes for inline comments and new config attribute

3d33689

chore:remove abbreviations from naming

242c263

chore:add suggested changes regarding naming, abbreviations and comme…

8c53eed

…nts in forward

chore:add suggested changes for adding configurable attributes for fl…

23a9db9

…ow completion net

chore:add suggested changes for config atrributes and naming

19fd848

RUFFY-369 requested a review from molbap October 25, 2024 06:03

molbap reviewed Oct 31, 2024

View reviewed changes


		downsample_inputs = self.downsample(inputs)

		features_enc1 = self.encoder1(downsample_inputs)


		return flow, edge

		def forward_bidirect_flow(self, masked_flows_bi, masks):

		# using itertools makes flattening a little faster :)
		self.resblocks = nn.ModuleList(list(itertools.chain.from_iterable(self.resblocks)))


		self.max_pool = nn.MaxPool2d(window_size, window_size, (0, 0))

		def forward(

Add propainter #33217

Are you sure you want to change the base?

Add propainter #33217

Conversation

RUFFY-369 commented Aug 30, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

The PR is more than ready for first pass of review!!!

TODO(will be done in a fly :)):

Results:

Example usage is provided in the doc file here

RUFFY-369 commented Oct 8, 2024

RUFFY-369 commented Oct 8, 2024

RUFFY-369 commented Oct 11, 2024

molbap commented Oct 11, 2024

molbap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RUFFY-369 commented Oct 22, 2024

RUFFY-369 commented Oct 31, 2024

molbap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RUFFY-369 commented Aug 30, 2024 •

edited

Loading