[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

lawrence-cj · 2024-11-21T06:16:57Z

What does this PR do?

This PR will add the official Sana (SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer) into the diffusers lib. Sana first makes the Text-to-Image available on 32x compressed latent space, powered by DC-AE(https://arxiv.org/abs/2410.10733v1) without performance degradation. Also, Sana contains several popular efficiency related techs, like DiT with Linear Attention processor and we use Decoder-only LLM (Gemma-2B-IT) for low GPU requirement and fast speed.

Paper: https://arxiv.org/abs/2410.10629
Original code repo: https://github.com/NVlabs/Sana
Project: https://nvlabs.github.io/Sana

Core contributor of DC-AE:
work with @[email protected]

Core library:

We want to collaborate on this PR together with friends from HF. Feel free to contact me here. Cc: @sayakpaul, @yiyixuxu

Core library:

Schedulers: @yiyixuxu
Pipelines and pipeline callbacks: @yiyixuxu and @asomoza
Docs: @stevhliu and @sayakpaul
General functionalities: @sayakpaul @yiyixuxu @DN6

HF projects:

transformers: different repo
safetensors: different repo

-->

Images is generated by `SanaPAGPipeline` with `FlowDPMSolverMultistepScheduler`

# Conflicts: # src/diffusers/models/normalization.py

Co-authored-by: YiYi Xu <[email protected]>

lawrence-cj · 2024-12-13T14:29:07Z

i think in the bf16 repository you have the fp32 weights as well; are those a fp32 copy of the bf16 compatible weights? if so that makes sense but otherwise it may confuse users that don't know to pass variant=bf16 in

Yes. It's just a FP32 copy of BF16 weight and I run it successfully.

bghira · 2024-12-13T18:47:02Z

without complex human instruction:

with:

is it possible there is something wrong with the CHI implementation here? it makes all images worse.

for example with CHI enabled it's putting 508 tokens of input through the model instead of just 300 (206 from CHI plus the 300 prompt tokens (padded) and i don't know why we need this many tokens. is it supposed to be 300 total?

lawrence-cj · 2024-12-13T23:05:15Z

What’s your inference code? @bghira

bghira · 2024-12-14T02:06:45Z

we use encode_prompt via pipeline to save the embed and then pass it back in for inference time so the text encoder can be unloaded first. other than this just using the BF16 weights

lawrence-cj · 2024-12-14T03:14:46Z

What's your prompt? @bghira

a-r-r-o-w · 2024-12-15T15:08:22Z

@hlky Would you like to give the changes to schedulers here a review? I'm preparing to merge it shortly after I add the integration tests in the next hour since YiYi has approved and confirmed on Slack. I've tested all the normal models (not the multilingual ones) and they seem to work well (I did the conversions myself when testing, but for the integration tests, I will be using the remote checkpoints and match slices). I have not exhaustively tested all scheduler changes though - only DPMSolverMultistep and FlowMatchEulerDiscrete, but I think that should be okay since it is copied logic (from make fix-copies).

HuggingFaceDocBuilderDev · 2024-12-15T15:19:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky

@a-r-r-o-w Scheduler changes look good, thanks

a-r-r-o-w

Thank you @lawrence-cj and team! The paper was very insightful and it was very cool to come across the ideas developed.

Thanks for bearing with our reviews too! Will merge the PR once the CI passes

lawrence-cj · 2024-12-17T07:44:35Z

Thank you so much for your effort! Love you guys. I was stuck by other things, sorry for the late reply! !
@sayakpaul @a-r-r-o-w @bghira @yiyixuxu @hlky

…AttentionProcessor`, `Flow-based DPM-sovler` and so on. (#9982) * first add a script for DC-AE; * DC-AE init * replace triton with custom implementation * 1. rename file and remove un-used codes; * no longer rely on omegaconf and dataclass * replace custom activation with diffuers activation * remove dc_ae attention in attention_processor.py * iinherit from ModelMixin * inherit from ConfigMixin * dc-ae reduce to one file * update downsample and upsample * clean code * support DecoderOutput * remove get_same_padding and val2tuple * remove autocast and some assert * update ResBlock * remove contents within super().__init__ * Update src/diffusers/models/autoencoders/dc_ae.py Co-authored-by: YiYi Xu <[email protected]> * remove opsequential * update other blocks to support the removal of build_norm * remove build encoder/decoder project in/out * remove inheritance of RMSNorm2d from LayerNorm * remove reset_parameters for RMSNorm2d Co-authored-by: YiYi Xu <[email protected]> * remove device and dtype in RMSNorm2d __init__ Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/models/autoencoders/dc_ae.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/models/autoencoders/dc_ae.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/models/autoencoders/dc_ae.py Co-authored-by: YiYi Xu <[email protected]> * remove op_list & build_block * remove build_stage_main * change file name to autoencoder_dc * move LiteMLA to attention.py * align with other vae decode output; * add DC-AE into init files; * update * make quality && make style; * quick push before dgx disappears again * update * make style * update * update * fix * refactor * refactor * refactor * update * possibly change to nn.Linear * refactor * make fix-copies * replace vae with ae * replace get_block_from_block_type to get_block * replace downsample_block_type from Conv to conv for consistency * add scaling factors * incorporate changes for all checkpoints * make style * move mla to attention processor file; split qkv conv to linears * refactor * add tests * from original file loader * add docs * add standard autoencoder methods * combine attention processor * fix tests * update * minor fix * minor fix * minor fix & in/out shortcut rename * minor fix * make style * fix paper link * update docs * update single file loading * make style * remove single file loading support; todo for DN6 * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * add abstract * 1. add DCAE into diffusers; 2. make style and make quality; * add DCAE_HF into diffusers; * bug fixed; * add SanaPipeline, SanaTransformer2D into diffusers; * add sanaLinearAttnProcessor2_0; * first update for SanaTransformer; * first update for SanaPipeline; * first success run SanaPipeline; * model output finally match with original model with the same intput; * code update; * code update; * add a flow dpm-solver scripts * 🎉[important update] 1. Integrate flow-dpm-sovler into diffusers; 2. finally run successfully on both `FlowMatchEulerDiscreteScheduler` and `FlowDPMSolverMultistepScheduler`; * 🎉🔧[important update & fix huge bugs!!] 1. add SanaPAGPipeline & several related Sana linear attention operators; 2. `SanaTransformer2DModel` not supports multi-resolution input; 2. fix the multi-scale HW bugs in SanaPipeline and SanaPAGPipeline; 3. fix the flow-dpm-solver set_timestep() init `model_output` and `lower_order_nums` bugs; * remove prints; * add convert sana official checkpoint to diffusers format Safetensor. * Update src/diffusers/models/transformers/sana_transformer_2d.py Co-authored-by: Steven Liu <[email protected]> * Update src/diffusers/models/transformers/sana_transformer_2d.py Co-authored-by: Steven Liu <[email protected]> * Update src/diffusers/models/transformers/sana_transformer_2d.py Co-authored-by: Steven Liu <[email protected]> * Update src/diffusers/pipelines/pag/pipeline_pag_sana.py Co-authored-by: Steven Liu <[email protected]> * Update src/diffusers/models/transformers/sana_transformer_2d.py Co-authored-by: Steven Liu <[email protected]> * Update src/diffusers/models/transformers/sana_transformer_2d.py Co-authored-by: Steven Liu <[email protected]> * Update src/diffusers/pipelines/sana/pipeline_sana.py Co-authored-by: Steven Liu <[email protected]> * Update src/diffusers/pipelines/sana/pipeline_sana.py Co-authored-by: Steven Liu <[email protected]> * update Sana for DC-AE's recent commit; * make style && make quality * Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932) * fix progress bar updates in SD 1.5 PAG Img2Img pipeline --------- Co-authored-by: Vinh H. Pham <[email protected]> Co-authored-by: Sayak Paul <[email protected]> * make the vae can be None in `__init__` of `SanaPipeline` * Update src/diffusers/models/transformers/sana_transformer_2d.py Co-authored-by: hlky <[email protected]> * change the ae related code due to the latest update of DCAE branch; * change the ae related code due to the latest update of DCAE branch; * 1. change code based on AutoencoderDC; 2. fix the bug of new GLUMBConv; 3. run success; * update for solving conversation. * 1. fix bugs and run convert script success; 2. Downloading ckpt from hub automatically; * make style && make quality; * 1. remove un-unsed parameters in init; 2. code update; * remove test file * refactor; add docs; add tests; update conversion script * make style * make fix-copies * refactor * udpate pipelines * pag tests and refactor * remove sana pag conversion script * handle weight casting in conversion script * update conversion script * add a processor * 1. add bf16 pth file path; 2. add complex human instruct in pipeline; * fix fast \tests * change gemma-2-2b-it ckpt to a non-gated repo; * fix the pth path bug in conversion script; * change grad ckpt to original; make style * fix the complex_human_instruct bug and typo; * remove dpmsolver flow scheduler * apply review suggestions * change the `FlowMatchEulerDiscreteScheduler` to default `DPMSolverMultistepScheduler` with flow matching scheduler. * fix the tokenizer.padding_side='right' bug; * update docs * make fix-copies * fix imports * fix docs * add integration test * update docs * update examples * fix convert_model_output in schedulers * fix failing tests --------- Co-authored-by: Junyu Chen <[email protected]> Co-authored-by: YiYi Xu <[email protected]> Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: chenjy2003 <[email protected]> Co-authored-by: Aryan <[email protected]> Co-authored-by: Steven Liu <[email protected]> Co-authored-by: hlky <[email protected]>

lawrence-cj and others added 30 commits October 18, 2024 17:40

first add a script for DC-AE;

6e616a9

Merge remote-tracking branch 'upstream/main' into DC-AE

d2e187a

DC-AE init

90e8939

replace triton with custom implementation

825c975

1. rename file and remove un-used codes;

3a44fa4

no longer rely on omegaconf and dataclass

55b2615

merge

6fb7fdb

Merge remote-tracking branch 'upstream/main' into DC-AE

c323e76

replace custom activation with diffuers activation

da7caa5

remove dc_ae attention in attention_processor.py

fb6d92a

iinherit from ModelMixin

5e63a1a

inherit from ConfigMixin

72cce2b

dc-ae reduce to one file

8f9b4e4

Merge remote-tracking branch 'upstream/main' into DC-AE

b7f68f9

Merge branch 'huggingface:main' into DC-AE

6d96b95

Merge remote-tracking branch 'refs/remotes/origin/main' into DC-AE

3c3cc51

# Conflicts: # src/diffusers/models/normalization.py

update downsample and upsample

1448681

merge

bf40fe8

clean code

dd7718a

support DecoderOutput

19986a5

Merge branch 'main' into DC-AE

3481e23

Merge branch 'main' into DC-AE

0e818df

remove get_same_padding and val2tuple

c6eb233

remove autocast and some assert

59de0a3

update ResBlock

ea604a4

remove contents within super().__init__

80dce02

Update src/diffusers/models/autoencoders/dc_ae.py

1752afd

Co-authored-by: YiYi Xu <[email protected]>

remove opsequential

883bcf4

Merge branch 'DC-AE' of github.com:lawrence-cj/diffusers into DC-AE

25ae389

update other blocks to support the removal of build_norm

96e844b

yujincheng08 mentioned this pull request Dec 14, 2024

cannot import name 'SanaPipeline' from 'diffusers' NVlabs/Sana#90

Closed

a-r-r-o-w added 3 commits December 15, 2024 12:16

Merge branch 'main' into Sana

c948a67

update docs

b7837c0

make fix-copies

5fb973c

a-r-r-o-w added 2 commits December 15, 2024 16:09

fix imports

168a0af

fix docs

0d722cb

hlky approved these changes Dec 15, 2024

View reviewed changes

a-r-r-o-w added 5 commits December 15, 2024 17:20

add integration test

0d32ef5

update docs

ea7878c

Merge branch 'main' into Sana

7b82bdc

update examples

884d29e

fix convert_model_output in schedulers

1bc1554

a-r-r-o-w approved these changes Dec 15, 2024

View reviewed changes

a-r-r-o-w added the roadmap Add to current release roadmap label Dec 15, 2024

fix failing tests

cb21289

a-r-r-o-w merged commit 5a196e3 into huggingface:main Dec 15, 2024
12 checks passed

vladmandic mentioned this pull request Dec 16, 2024

Sana issues #10241

Closed

vladmandic mentioned this pull request Dec 17, 2024

UniPC with FlowMatch fails with index out-of-bounds #10266

Closed

ukaprch mentioned this pull request Jan 9, 2025

Update FlowMatchEulerDiscreteScheduler with new design to support SD3 / SD3.5 / Flux moving forward #10511

Open

2 tasks

a-r-r-o-w mentioned this pull request Jan 20, 2025

Refactor gradient checkpointing #10611

Merged

chenguolin mentioned this pull request Feb 6, 2025

Question about the diffuser_DiffSplat module chenguolin/DiffSplat#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

lawrence-cj commented Nov 21, 2024

lawrence-cj commented Dec 13, 2024

bghira commented Dec 13, 2024 •

edited

Loading

lawrence-cj commented Dec 13, 2024

bghira commented Dec 14, 2024

lawrence-cj commented Dec 14, 2024

a-r-r-o-w commented Dec 15, 2024

HuggingFaceDocBuilderDev commented Dec 15, 2024

hlky left a comment

a-r-r-o-w left a comment

lawrence-cj commented Dec 17, 2024

[Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. #9982

[Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. #9982

Conversation

lawrence-cj commented Nov 21, 2024

What does this PR do?

Images is generated by SanaPAGPipeline with FlowDPMSolverMultistepScheduler

lawrence-cj commented Dec 13, 2024

bghira commented Dec 13, 2024 • edited Loading

lawrence-cj commented Dec 13, 2024

bghira commented Dec 14, 2024

lawrence-cj commented Dec 14, 2024

a-r-r-o-w commented Dec 15, 2024

HuggingFaceDocBuilderDev commented Dec 15, 2024

hlky left a comment

Choose a reason for hiding this comment

a-r-r-o-w left a comment

Choose a reason for hiding this comment

lawrence-cj commented Dec 17, 2024

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

Images is generated by `SanaPAGPipeline` with `FlowDPMSolverMultistepScheduler`

bghira commented Dec 13, 2024 •

edited

Loading