[core] Allegro T2V #9736

a-r-r-o-w · 2024-10-21T22:12:37Z

What does this PR do?

Model: https://huggingface.co/rhymes-ai/Allegro
Github: https://github.com/rhymes-ai/Allegro

HuggingFaceDocBuilderDev · 2024-10-21T22:20:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/diffusers/models/autoencoders/autoencoder_kl_allegro.py

Co-Authored-By: Huan Yang <[email protected]>

Co-Authored-By: YiYi Xu <[email protected]>

a-r-r-o-w · 2024-10-22T23:18:36Z

It looks like something broke when doing the VAE refactor - looking into it at the moment. Will fix the broken tests afterwards

src/diffusers/models/autoencoders/autoencoder_kl_allegro.py

DN6 · 2024-10-23T09:03:37Z

src/diffusers/pipelines/allegro/pipeline_allegro.py

+        frames = frames.permute(0, 2, 1, 3, 4)  # [batch_size, channels, num_frames, height, width]
+        return frames
+
+    def _prepare_rotary_positional_embeddings(


Not a blocker to merge.

We currently have a mix of creating rotary embeddings like this in a few pipelines (Cog, Lumina, Hunyuan)
Was there a specific reason I missed to go this route as opposed to creating a dedicated layer in the transformer (Flux)? Is it because we need access to height, width etc to create the embedding

Yes, this is on my mind. Will take up dedicated RoPE layer refactor for existing models that do it in the pipeline in a future PR

tests/pipelines/allegro/test_allegro.py

DN6 · 2024-10-23T09:07:18Z

tests/pipelines/allegro/test_allegro.py

+        ).frames
+
+        video = videos[0]
+        expected_video = torch.randn(1, 88, 720, 1280, 3).numpy()


I assume this will be updated to a real video?

tests/pipelines/allegro/test_allegro.py

DN6 · 2024-10-23T09:12:55Z

LGTM. There's a failing test that looks related to saving/loading the transformer.

stevhliu

Very nice, just a few typos :)

src/diffusers/models/attention_processor.py

src/diffusers/models/transformers/transformer_allegro.py

src/diffusers/pipelines/allegro/pipeline_allegro.py

Co-authored-by: Steven Liu <[email protected]>

a-r-r-o-w · 2024-10-23T20:49:51Z

src/diffusers/models/autoencoders/autoencoder_kl_allegro.py

+        if self.use_tiling:
+            return self.tiled_decode(z)
+
+        raise NotImplementedError("Decoding without tiling has not been implemented yet.")


@yiyixuxu Is this okay for now? There are some followups that we could look into later and maybe rewrite the tiling implementation similar to our other VAEs.

I don't think the model works well with lower number of frames (in which case not using tiling would be faster when decoding), so we should probably always just use tiling since we have 88 frames as the default (and recommended)

yiyixuxu · 2024-10-25T22:13:44Z

src/diffusers/models/normalization.py

@@ -266,6 +263,7 @@ def forward(
        hidden_dtype: Optional[torch.dtype] = None,
    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
        # No modulation happening here.
+        added_cond_kwargs = added_cond_kwargs or {"resolution": None, "aspect_ratio": None}


oh what is this? is this fixing a current bug?

Yep. We accept None in added_cond_kwargs here, but we actually need to pass values for resolution and aspect_ratio in the following pixart embedding layer (which requires it as non-defaulted arg)

yiyixuxu

thanks!
we can merge once the tests are fixed
I can look into refactoring the 3d rope in a follow up PR (or if you want to leave this a little bit longer it is ok too, up to you!)

Ednaordinary · 2024-10-27T00:30:05Z

I have something really weird happening here. Feeding in a list of prompts (even if the list is only one prompt) results in a really bad video. It might be related to #9769 (comment), but I can't be sure. It seems cfg baked, so it would make sense if it was a cfg error.

prompt = ["Orbital shot of a squirrel nibbles on a nut while sitting in a tree"]:

listprompt.mp4

prompt = "Orbital shot of a squirrel nibbles on a nut while sitting in a tree":

stringprompt.mp4

DN6 · 2024-10-28T15:34:59Z

@Ednaordinary Can you share a code snippet? Could you include how you're loading the pipeline? I'm unable to reproduce the issue.

Ednaordinary · 2024-10-28T23:32:04Z

I checked further, and it happens when the prompt and negative prompt are length one lists (even with the list being [None], I think), but not when the prompt is a list and negative is unspecified (my mistake). I have yet to test further

that's kinda incoherent, here's a snippet:
video = model(["A squirrel sitting on a tree and nibbling on an acorn."], negative_prompt=[""], num_frames=88, num_inference_steps=20, guidance_scale=7.5)

I'm using UniPC since its way faster. Testing without the negative_prompt specified at all, it works fine. ~~I haven't tested with commit 9214f4a merged yet, though~~

foreverpiano · 2024-10-29T00:42:47Z

@Ednaordinary So the problem is that u are using empty negative prompt?

Ednaordinary · 2024-10-29T02:26:18Z

@foreverpiano I don't believe so, as passing None in a list to negative_prompt also seems to trigger it. It also looks suspiciously like cfg baking, which I can't be certain but I feel as if negative prompting with nothing wouldn't cause

The way I'm passing in arguments is using the same interface I pass them in for other pipelines, which ive never had issues with. I use a batching mechanism that passes in multiple prompts, in a list, even if there's only one prompt. negative_prompt is converted to None of its blank. Changing this to only passing in a string instead of a list and only batching one image (multi-prompt doesn't currently seem to work on this pipeline regardless) fixed things, for whatever reason

[None] as negative_prompt:

allegro.mp4

a-r-r-o-w · 2024-10-29T07:10:15Z

@Ednaordinary I think should be fixed with the latest commit. LMK if it still persists - if so, will fix in a follow-up PR

* update * refactor transformer part 1 * refactor part 2 * refactor part 3 * make style * refactor part 4; modeling tests * make style * refactor part 5 * refactor part 6 * gradient checkpointing * pipeline tests (broken atm) * update * add coauthor Co-Authored-By: Huan Yang <[email protected]> * refactor part 7 * add docs * make style * add coauthor Co-Authored-By: YiYi Xu <[email protected]> * make fix-copies * undo unrelated change * revert changes to embeddings, normalization, transformer * refactor part 8 * make style * refactor part 9 * make style * fix * apply suggestions from review * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * update example * remove attention mask for self-attention * update * copied from * update * update --------- Co-authored-by: Huan Yang <[email protected]> Co-authored-by: YiYi Xu <[email protected]> Co-authored-by: Steven Liu <[email protected]>

a-r-r-o-w added 2 commits October 22, 2024 00:09

update

199c240

refactor transformer part 1

901d10e

a-r-r-o-w added 7 commits October 22, 2024 01:42

refactor part 2

ec05bbd

refactor part 3

892b70d

make style

fd18f9a

refactor part 4; modeling tests

4f1653c

make style

412cd7c

Merge branch 'main' into allegro-impl

bcba858

refactor part 5

8f9ffa8

yiyixuxu reviewed Oct 22, 2024

View reviewed changes

a-r-r-o-w added 4 commits October 22, 2024 03:12

refactor part 6

c76dc5a

gradient checkpointing

015cc78

pipeline tests (broken atm)

6b53b85

update

f64f2d0

yiyixuxu reviewed Oct 22, 2024

View reviewed changes

src/diffusers/models/autoencoders/autoencoder_kl_allegro.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Oct 22, 2024

View reviewed changes

src/diffusers/models/autoencoders/autoencoder_kl_allegro.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Oct 22, 2024

View reviewed changes

src/diffusers/models/autoencoders/autoencoder_kl_allegro.py Outdated Show resolved Hide resolved

a-r-r-o-w and others added 6 commits October 23, 2024 00:55

add coauthor

2ef6a9e

Co-Authored-By: Huan Yang <[email protected]>

refactor part 7

e53dac2

add docs

f702af0

Merge branch 'main' into allegro-impl

4f59d56

make style

3d41281

add coauthor

37e8a95

Co-Authored-By: YiYi Xu <[email protected]>

a-r-r-o-w marked this pull request as ready for review October 22, 2024 23:10

make fix-copies

2c4645c

a-r-r-o-w requested review from DN6, yiyixuxu and stevhliu October 22, 2024 23:12

Merge branch 'main' into allegro-impl

31544d4

DN6 approved these changes Oct 23, 2024

View reviewed changes

a-r-r-o-w added 2 commits October 23, 2024 11:49

fix

d9eabf8

apply suggestions from review

cf010fc

stevhliu approved these changes Oct 23, 2024

View reviewed changes

a-r-r-o-w and others added 2 commits October 24, 2024 01:43

Apply suggestions from code review

d44a5c8

Co-authored-by: Steven Liu <[email protected]>

Merge branch 'main' into allegro-impl

ceb7678

a-r-r-o-w commented Oct 23, 2024

View reviewed changes

a-r-r-o-w added 2 commits October 24, 2024 03:52

update example

b036386

Merge branch 'main' into allegro-impl

0fe8c51

ariG23498 mentioned this pull request Oct 25, 2024

Add support for Allegro huggingface/transformers#34347

Closed

2 tasks

yiyixuxu reviewed Oct 25, 2024

View reviewed changes

yiyixuxu approved these changes Oct 25, 2024

View reviewed changes

Merge branch 'main' into allegro-impl

2065adc

remove attention mask for self-attention

9214f4a

a-r-r-o-w added 3 commits October 29, 2024 05:09

Merge branch 'main' into allegro-impl

723e5b5

update

3354ee1

copied from

28e5758

a-r-r-o-w added 2 commits October 29, 2024 08:15

update

1ec17d5

update

4d6d4e4

a-r-r-o-w merged commit 0d1d267 into main Oct 29, 2024
18 checks passed

a-r-r-o-w deleted the allegro-impl branch October 29, 2024 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Allegro T2V #9736

[core] Allegro T2V #9736

a-r-r-o-w commented Oct 21, 2024

HuggingFaceDocBuilderDev commented Oct 21, 2024

a-r-r-o-w commented Oct 22, 2024

DN6 Oct 23, 2024

a-r-r-o-w Oct 29, 2024

DN6 Oct 23, 2024

DN6 commented Oct 23, 2024

stevhliu left a comment

a-r-r-o-w Oct 23, 2024

yiyixuxu Oct 25, 2024

a-r-r-o-w Oct 25, 2024

yiyixuxu left a comment

Ednaordinary commented Oct 27, 2024

DN6 commented Oct 28, 2024 •

edited

Loading

Ednaordinary commented Oct 28, 2024 •

edited

Loading

foreverpiano commented Oct 29, 2024

Ednaordinary commented Oct 29, 2024 •

edited

Loading

a-r-r-o-w commented Oct 29, 2024

[core] Allegro T2V #9736

[core] Allegro T2V #9736

Conversation

a-r-r-o-w commented Oct 21, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Oct 21, 2024

a-r-r-o-w commented Oct 22, 2024

DN6 Oct 23, 2024

Choose a reason for hiding this comment

a-r-r-o-w Oct 29, 2024

Choose a reason for hiding this comment

DN6 Oct 23, 2024

Choose a reason for hiding this comment

DN6 commented Oct 23, 2024

stevhliu left a comment

Choose a reason for hiding this comment

a-r-r-o-w Oct 23, 2024

Choose a reason for hiding this comment

yiyixuxu Oct 25, 2024

Choose a reason for hiding this comment

a-r-r-o-w Oct 25, 2024

Choose a reason for hiding this comment

yiyixuxu left a comment

Choose a reason for hiding this comment

Ednaordinary commented Oct 27, 2024

DN6 commented Oct 28, 2024 • edited Loading

Ednaordinary commented Oct 28, 2024 • edited Loading

foreverpiano commented Oct 29, 2024

Ednaordinary commented Oct 29, 2024 • edited Loading

a-r-r-o-w commented Oct 29, 2024

DN6 commented Oct 28, 2024 •

edited

Loading

Ednaordinary commented Oct 28, 2024 •

edited

Loading

Ednaordinary commented Oct 29, 2024 •

edited

Loading