Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. #9982

Merged
merged 177 commits into from
Dec 15, 2024

Conversation

lawrence-cj
Copy link
Contributor

What does this PR do?

This PR will add the official Sana (SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer) into the diffusers lib. Sana first makes the Text-to-Image available on 32x compressed latent space, powered by DC-AE(https://arxiv.org/abs/2410.10733v1) without performance degradation. Also, Sana contains several popular efficiency related techs, like DiT with Linear Attention processor and we use Decoder-only LLM (Gemma-2B-IT) for low GPU requirement and fast speed.

Paper: https://arxiv.org/abs/2410.10629
Original code repo: https://github.com/NVlabs/Sana
Project: https://nvlabs.github.io/Sana

Core contributor of DC-AE:
work with @[email protected]

Core library:

We want to collaborate on this PR together with friends from HF. Feel free to contact me here. Cc: @sayakpaul, @yiyixuxu

Core library:

HF projects:

-->

Images is generated by SanaPAGPipeline with FlowDPMSolverMultistepScheduler

5361732169697_ pic_hd

@lawrence-cj
Copy link
Contributor Author

i think in the bf16 repository you have the fp32 weights as well; are those a fp32 copy of the bf16 compatible weights? if so that makes sense but otherwise it may confuse users that don't know to pass variant=bf16 in

Yes. It's just a FP32 copy of BF16 weight and I run it successfully.

@bghira
Copy link
Contributor

bghira commented Dec 13, 2024

without complex human instruction:

image

with:

image

is it possible there is something wrong with the CHI implementation here? it makes all images worse.

for example with CHI enabled it's putting 508 tokens of input through the model instead of just 300 (206 from CHI plus the 300 prompt tokens (padded) and i don't know why we need this many tokens. is it supposed to be 300 total?

@lawrence-cj
Copy link
Contributor Author

What’s your inference code? @bghira

@bghira
Copy link
Contributor

bghira commented Dec 14, 2024

we use encode_prompt via pipeline to save the embed and then pass it back in for inference time so the text encoder can be unloaded first. other than this just using the BF16 weights

@lawrence-cj
Copy link
Contributor Author

What's your prompt? @bghira

@a-r-r-o-w
Copy link
Member

@hlky Would you like to give the changes to schedulers here a review? I'm preparing to merge it shortly after I add the integration tests in the next hour since YiYi has approved and confirmed on Slack. I've tested all the normal models (not the multilingual ones) and they seem to work well (I did the conversions myself when testing, but for the integration tests, I will be using the remote checkpoints and match slices). I have not exhaustively tested all scheduler changes though - only DPMSolverMultistep and FlowMatchEulerDiscrete, but I think that should be okay since it is copied logic (from make fix-copies).

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@hlky hlky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-r-r-o-w Scheduler changes look good, thanks

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lawrence-cj and team! The paper was very insightful and it was very cool to come across the ideas developed.

Thanks for bearing with our reviews too! Will merge the PR once the CI passes

@a-r-r-o-w a-r-r-o-w added the roadmap Add to current release roadmap label Dec 15, 2024
@a-r-r-o-w a-r-r-o-w merged commit 5a196e3 into huggingface:main Dec 15, 2024
12 checks passed
@vladmandic vladmandic mentioned this pull request Dec 16, 2024
@lawrence-cj
Copy link
Contributor Author

Thank you so much for your effort! Love you guys. I was stuck by other things, sorry for the late reply! !
@sayakpaul @a-r-r-o-w @bghira @yiyixuxu @hlky

sayakpaul added a commit that referenced this pull request Dec 23, 2024
…AttentionProcessor`, `Flow-based DPM-sovler` and so on. (#9982)

* first add a script for DC-AE;

* DC-AE init

* replace triton with custom implementation

* 1. rename file and remove un-used codes;

* no longer rely on omegaconf and dataclass

* replace custom activation with diffuers activation

* remove dc_ae attention in attention_processor.py

* iinherit from ModelMixin

* inherit from ConfigMixin

* dc-ae reduce to one file

* update downsample and upsample

* clean code

* support DecoderOutput

* remove get_same_padding and val2tuple

* remove autocast and some assert

* update ResBlock

* remove contents within super().__init__

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <[email protected]>

* remove opsequential

* update other blocks to support the removal of build_norm

* remove build encoder/decoder project in/out

* remove inheritance of RMSNorm2d from LayerNorm

* remove reset_parameters for RMSNorm2d

Co-authored-by: YiYi Xu <[email protected]>

* remove device and dtype in RMSNorm2d __init__

Co-authored-by: YiYi Xu <[email protected]>

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <[email protected]>

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <[email protected]>

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <[email protected]>

* remove op_list & build_block

* remove build_stage_main

* change file name to autoencoder_dc

* move LiteMLA to attention.py

* align with other vae decode output;

* add DC-AE into init files;

* update

* make quality && make style;

* quick push before dgx disappears again

* update

* make style

* update

* update

* fix

* refactor

* refactor

* refactor

* update

* possibly change to nn.Linear

* refactor

* make fix-copies

* replace vae with ae

* replace get_block_from_block_type to get_block

* replace downsample_block_type from Conv to conv for consistency

* add scaling factors

* incorporate changes for all checkpoints

* make style

* move mla to attention processor file; split qkv conv to linears

* refactor

* add tests

* from original file loader

* add docs

* add standard autoencoder methods

* combine attention processor

* fix tests

* update

* minor fix

* minor fix

* minor fix & in/out shortcut rename

* minor fix

* make style

* fix paper link

* update docs

* update single file loading

* make style

* remove single file loading support; todo for DN6

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

* add abstract

* 1. add DCAE into diffusers;
2. make style and make quality;

* add DCAE_HF into diffusers;

* bug fixed;

* add SanaPipeline, SanaTransformer2D into diffusers;

* add sanaLinearAttnProcessor2_0;

* first update for SanaTransformer;

* first update for SanaPipeline;

* first success run SanaPipeline;

* model output finally match with original model with the same intput;

* code update;

* code update;

* add a flow dpm-solver scripts

* 🎉[important update]
1. Integrate flow-dpm-sovler into diffusers;
2. finally run successfully on both `FlowMatchEulerDiscreteScheduler` and `FlowDPMSolverMultistepScheduler`;

* 🎉🔧[important update & fix huge bugs!!]
1. add SanaPAGPipeline & several related Sana linear attention operators;
2. `SanaTransformer2DModel` not supports multi-resolution input;
2. fix the multi-scale HW bugs in SanaPipeline and SanaPAGPipeline;
3. fix the flow-dpm-solver set_timestep() init `model_output` and `lower_order_nums` bugs;

* remove prints;

* add convert sana official checkpoint to diffusers format Safetensor.

* Update src/diffusers/models/transformers/sana_transformer_2d.py

Co-authored-by: Steven Liu <[email protected]>

* Update src/diffusers/models/transformers/sana_transformer_2d.py

Co-authored-by: Steven Liu <[email protected]>

* Update src/diffusers/models/transformers/sana_transformer_2d.py

Co-authored-by: Steven Liu <[email protected]>

* Update src/diffusers/pipelines/pag/pipeline_pag_sana.py

Co-authored-by: Steven Liu <[email protected]>

* Update src/diffusers/models/transformers/sana_transformer_2d.py

Co-authored-by: Steven Liu <[email protected]>

* Update src/diffusers/models/transformers/sana_transformer_2d.py

Co-authored-by: Steven Liu <[email protected]>

* Update src/diffusers/pipelines/sana/pipeline_sana.py

Co-authored-by: Steven Liu <[email protected]>

* Update src/diffusers/pipelines/sana/pipeline_sana.py

Co-authored-by: Steven Liu <[email protected]>

* update Sana for DC-AE's recent commit;

* make style && make quality

* Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932)

* fix progress bar updates in SD 1.5 PAG Img2Img pipeline

---------

Co-authored-by: Vinh H. Pham <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>

* make the vae can be None in `__init__` of `SanaPipeline`

* Update src/diffusers/models/transformers/sana_transformer_2d.py

Co-authored-by: hlky <[email protected]>

* change the ae related code due to the latest update of DCAE branch;

* change the ae related code due to the latest update of DCAE branch;

* 1. change code based on AutoencoderDC;
2. fix the bug of new GLUMBConv;
3. run success;

* update for solving conversation.

* 1. fix bugs and run convert script success;
2. Downloading ckpt from hub automatically;

* make style && make quality;

* 1. remove un-unsed parameters in init;
2. code update;

* remove test file

* refactor; add docs; add tests; update conversion script

* make style

* make fix-copies

* refactor

* udpate pipelines

* pag tests and refactor

* remove sana pag conversion script

* handle weight casting in conversion script

* update conversion script

* add a processor

* 1. add bf16 pth file path;
2. add complex human instruct in pipeline;

* fix fast \tests

* change gemma-2-2b-it ckpt to a non-gated repo;

* fix the pth path bug in conversion script;

* change grad ckpt to original; make style

* fix the complex_human_instruct bug and typo;

* remove dpmsolver flow scheduler

* apply review suggestions

* change the `FlowMatchEulerDiscreteScheduler` to default `DPMSolverMultistepScheduler` with flow matching scheduler.

* fix the tokenizer.padding_side='right' bug;

* update docs

* make fix-copies

* fix imports

* fix docs

* add integration test

* update docs

* update examples

* fix convert_model_output in schedulers

* fix failing tests

---------

Co-authored-by: Junyu Chen <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: chenjy2003 <[email protected]>
Co-authored-by: Aryan <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: hlky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
close-to-merge roadmap Add to current release roadmap
Projects
None yet
Development

Successfully merging this pull request may close these issues.