Cogvideox-5B Model adapter change #9203

zRzRzRzRzRzRzR · 2024-08-16T13:49:40Z

What does this PR do?

This is the draft of PR CogVideoX-5B (CogVideoX-Pro), including:

using get_3d_rotary_pos_embed
convert_transformer need to remove some parm
using with CogVideoXAttnProcessor2_0 with image_rotary_emb

This is still a draft and need to do more adaptation for run

HuggingFaceDocBuilderDev · 2024-08-19T13:42:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

thanks!
I left some comments, but code changes look good to me to merge if we are in a hurry (can do some follow-up refractory)

yiyixuxu · 2024-08-22T22:58:46Z

src/diffusers/models/embeddings.py

+    dim_w = embed_dim // 8 * 3
+
+    # Temporal frequencies
+    freqs_t = 1.0 / (theta ** (torch.arange(0, dim_t, 2).float() / dim_t))


let's refactor with get_1d_rotary_pos_embeds (

diffusers/src/diffusers/models/embeddings.py

Line 441 in dc07fc2

def get_1d_rotary_pos_embed(

) (in a follow-up PR )

yiyixuxu · 2024-08-22T23:04:31Z

src/diffusers/models/embeddings.py

@@ -532,7 +617,10 @@ def apply_rotary_emb(
        else:
            raise ValueError(f"`use_real_unbind_dim={use_real_unbind_dim}` but should be -1 or -2.")

-        out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
+        if upcast:


ok but I'm wondering why upcasting will cause not able to generate videos for cogvideox
maybe we can test out not downcast rotary embedding instead (all our other pipelines keep it in float32)

I mainly added this to numerically match the output of apply_rotary_emb with original implementation. It seems that even if we compute in float32, the results are great and the small diff does not impact quality, now that other bugs have been resolved. We can undo this change

a-r-r-o-w · 2024-08-23T00:13:42Z

@yiyixuxu @zRzRzRzRzRzRzR Would it be okay to remove the following limit in the follow-up PR?

diffusers/src/diffusers/pipelines/cogvideo/pipeline_cogvideox.py

Lines 527 to 530 in dc07fc2

    
           if num_frames > 49: 
        
               raise ValueError( 
        
                   "The number of frames must be less than 49 for now due to static positional embeddings. This will be updated in the future to remove this limitation." 
        
               )

I tested the 5B-model with generations of 57, 65, and 73 frames and they all turn out good - maybe the RoPE embeddings help the model generalize better. For the 2B-model, the outputs are bad for the above values probably due to limitations of normal PE. We could add a recommendation in docs mentioning that 49 frames and below is the good setting for 2B.

In the refactor, I'd also like to create the normal positional embeddings in the pipeline instead of the transformer, similar to rope embeds, because it does not make sense to create them for the 5B model (currently they are created and saved to the module with a call to register_buffer regardless of whether 2B or 5B).

yiyixuxu · 2024-08-23T00:49:52Z

src/diffusers/pipelines/cogvideo/pipeline_cogvideox.py

+            use_real=True,
+        )
+
+        freqs_cos = freqs_cos.to(device=device, dtype=dtype)


@a-r-r-o-w

if we're going to undo the change about upcast in apply_rotary_embedding I think it is better to not downcast freqs_cos and freqs_sin here too
i.e. change to freqs_cos = freqs_cos.to(device=device) here

otherwise I think the rotary embedding is created in float32, downcasted to bfloat16 here, and then upcast to float32 when applying - this cannot be good even if it does not have a noticeable impact

can you test it out?

Ah sorry, I overlooked this. Testing

Also just noticed that apply_rotary_emb moves the embeddings to the correct device, so we can just remove these 2 lines entirely

* draft of embedding --------- Co-authored-by: Aryan <[email protected]>

zRzRzRzRzRzRzR and others added 9 commits August 14, 2024 17:31

draft of embedding

a7941c5

For 5B

13c9f04

Merge branch 'huggingface:main' into cogvideox-5b

b739f69

Update cogvideox_transformer_3d.py

553aaed

Merge branch 'main' into cogvideox-5b

153273d

revert tab spacing changes

431793a

unrevert pipeline changes except tab spacing

4cdc271

update conversion script

dbc0b2e

refactor and cleanup; make style

991d058

a-r-r-o-w and others added 9 commits August 20, 2024 00:26

make style; fix autoencoder scaling factor

0103783

fix bugs

2f178ee

Merge branch 'main' into cogvideox-5b

17f09ff

Merge branch 'main' into cogvideox-5b

a2710d8

rebase

4a22392

add qkv fusion support

e1a51a7

make fix-copies

43c4edb

remove commented changes

e5c6861

Update convert_cogvideox_to_diffusers.py

6ee1e28

a-r-r-o-w requested a review from yiyixuxu August 22, 2024 17:07

yiyixuxu approved these changes Aug 22, 2024

View reviewed changes

a-r-r-o-w added 3 commits August 23, 2024 02:20

revert upcast changes

2badcc5

Merge branch 'main' into cogvideox-5b

cf4ff18

fix rope call

ae43411

yiyixuxu reviewed Aug 23, 2024

View reviewed changes

a-r-r-o-w added 3 commits August 23, 2024 03:18

add qkv fusion test

a8f5ce0

update

5e99247

update docs

e7cd7a9

yiyixuxu merged commit 960c149 into huggingface:main Aug 23, 2024
14 of 15 checks passed

yiyixuxu pushed a commit that referenced this pull request Aug 24, 2024

Cogvideox-5B Model adapter change (#9203)

06f3671

* draft of embedding --------- Co-authored-by: Aryan <[email protected]>

sayakpaul pushed a commit that referenced this pull request Dec 23, 2024

Cogvideox-5B Model adapter change (#9203)

977afbe

* draft of embedding --------- Co-authored-by: Aryan <[email protected]>

zRzRzRzRzRzRzR deleted the cogvideox-5b branch January 14, 2025 06:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cogvideox-5B Model adapter change #9203

Cogvideox-5B Model adapter change #9203

zRzRzRzRzRzRzR commented Aug 16, 2024

HuggingFaceDocBuilderDev commented Aug 19, 2024

yiyixuxu left a comment

yiyixuxu Aug 22, 2024

yiyixuxu Aug 22, 2024

a-r-r-o-w Aug 23, 2024

a-r-r-o-w commented Aug 23, 2024

yiyixuxu Aug 23, 2024 •

edited

Loading

a-r-r-o-w Aug 23, 2024

a-r-r-o-w Aug 23, 2024

Cogvideox-5B Model adapter change #9203

Cogvideox-5B Model adapter change #9203

Conversation

zRzRzRzRzRzRzR commented Aug 16, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Aug 19, 2024

yiyixuxu left a comment

Choose a reason for hiding this comment

yiyixuxu Aug 22, 2024

Choose a reason for hiding this comment

yiyixuxu Aug 22, 2024

Choose a reason for hiding this comment

a-r-r-o-w Aug 23, 2024

Choose a reason for hiding this comment

a-r-r-o-w commented Aug 23, 2024

yiyixuxu Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

a-r-r-o-w Aug 23, 2024

Choose a reason for hiding this comment

a-r-r-o-w Aug 23, 2024

Choose a reason for hiding this comment

yiyixuxu Aug 23, 2024 •

edited

Loading