Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flux inpainting and Flux Img2Img #9135

Merged
merged 27 commits into from
Sep 4, 2024
Merged

Conversation

Gothos
Copy link
Contributor

@Gothos Gothos commented Aug 9, 2024

What does this PR do?

PR to add

  1. flux inpainting.
  2. flux img2img

Before submitting

Adds basic flux inpainting. This still has some ways to go, especially since any flux equivalent of 9-channel inpainting is not supported yet. I'd also like comments on noising.
Image, mask, and inpainting a cactus at strengths from 0.65 to 0.9
image
image

@a-r-r-o-w
Copy link
Member

a-r-r-o-w commented Aug 12, 2024

@Gothos This is looking great! Since this PR is not yet marked for review, I assume it is incomplete in some ways. Let us know if you're facing any problems and we'd be happy to help. There are a couple of issues and messages from folks asking to have this implemented and usable from diffusers, so really nice of you to take this up :)

cc @asomoza here for more testing and implementation/noising improvements 🤩

@Gothos
Copy link
Contributor Author

Gothos commented Aug 12, 2024

It works ootb. I probably should have marked it as ready for review really, since it's missing only the inpainting-only checkpoint (i.e models similar to stable-diffusion-xl-inpainting-0.1, which we don't have for flux) support and docs.

@SkalskiP
Copy link

Hi @Gothos 👋🏻 can you provide any usage example showing how to run the inpainting pipeline?

@Gothos
Copy link
Contributor Author

Gothos commented Aug 12, 2024

Sure!

First

pip3 install git+https://github.com/Gothos/diffusers.git@flux-inpaint

then:

from diffusers import FluxInpaintPipeline
from PIL import Image
import torch

pipe =  FluxInpaintPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)

prompt = "your prompt here"
image = pipe(
    prompt,
    image = Image.open("path/to/image",
    mask_image=Image.open("path/to/mask"),
    strength=0.85, # below 0.85 doesn't seem to cause a lot of change
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image

Just replacing the path to the image and path to the mask, and prompt should work.

@Gothos Gothos mentioned this pull request Aug 12, 2024
2 tasks
@Gothos
Copy link
Contributor Author

Gothos commented Aug 13, 2024

@asomoza if I'm not wrong the inpainting trained flux endpoint should check for 132 channels? If this is the case I'll probably finish the PR today.

@Gothos
Copy link
Contributor Author

Gothos commented Aug 13, 2024

Also correct me if I'm wrong, but isn't the img2img equivalent to having an all-white mask in inpainting/not selectively blending latents in denoise step? I can add in an img2img pipeline as well if this is the case, since it'll involve minimal changes from inpainting. @a-r-r-o-w @asomoza

@Gothos
Copy link
Contributor Author

Gothos commented Aug 13, 2024

I've added in img2img as well now.
Image and img2img into a night scene, with strengths 0.65,0.7,0.75,0.8,0.85,0.9,0.95.
image

image

@Gothos Gothos marked this pull request as ready for review August 13, 2024 12:06
@a-r-r-o-w a-r-r-o-w requested a review from asomoza August 13, 2024 12:07
@SkalskiP
Copy link

@Gothos awesome work! I build FLUX.1 inpainting HF space using code from this PR: https://huggingface.co/spaces/SkalskiP/FLUX.1-inpaint

@Gothos
Copy link
Contributor Author

Gothos commented Aug 13, 2024

Yeah saw the space and the linkedin post! Thanks for the mention!

@DN6
Copy link
Collaborator

DN6 commented Aug 14, 2024

Nice work @Gothos! A few things before we merge. Can we

  1. Resolve the merge conflicts
  2. Update the PR description/title to also include Img2Img
  3. I think we can remove the check for the 132 channels in the inpainting pipeline for now. The assumption here is reasonable, but since there isn't an actual checkpoint to test with, we don't need to preemptively add.
  4. Can we add fast/slow tests for the pipelines.

@Gothos
Copy link
Contributor Author

Gothos commented Aug 14, 2024

Will do today.

@Gothos
Copy link
Contributor Author

Gothos commented Aug 14, 2024

Do you also suggest I put in #9153 for these two pipelines, @DN6 ?

@DN6
Copy link
Collaborator

DN6 commented Aug 14, 2024

@Gothos Yeah you can do that as well 👍🏽

@Gothos
Copy link
Contributor Author

Gothos commented Aug 14, 2024

Cool, will do all these and request a review.

@fursund
Copy link

fursund commented Aug 14, 2024

Looks like this fails on mac with MPS. There's been some recent fixes to FLUX for diffusers that might have to be added here as well?

@fursund
Copy link

fursund commented Aug 14, 2024

This is the error I get: TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

@Gothos
Copy link
Contributor Author

Gothos commented Aug 14, 2024

Hmm I don't really have a mac to test this. Could you point out the PR?

@fursund
Copy link

fursund commented Aug 14, 2024

Hmm maybe it's still broken: #9047 ... potentially this fix: #9097

@Gothos
Copy link
Contributor Author

Gothos commented Aug 14, 2024

Hmm maybe it's still broken: #9047 ... potentially this fix: #9097

It still is, for most torch dists I guess. Try torch 2.4 or above. It might fix this.

shape = (batch_size, num_channels_latents, height, width)
latent_image_ids = self._prepare_latent_image_ids(batch_size, height, width, device, dtype)

if latents is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent with the defination of latents input in our other img2img pipelines (they are image latents)


if latents is None:
noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
latents = self.scheduler.scale_noise(image_latents, timestep, noise)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that we do not need is_strength_max for flow match based models: it is a pure noise when strengh==1

sample = sigma * noise + (1.0 - sigma) * sample

will remove that for sd3 inpaint too

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Sep 3, 2024

@Gothos
thanks for your PR!
I made some final changes. We will merge this very soon.

If you can make some final checks, that would be great! (no worries if not)

nd sorry we're a bit slow in this

@Gothos
Copy link
Contributor Author

Gothos commented Sep 3, 2024 via email

@yiyixuxu yiyixuxu merged commit 249a9e4 into huggingface:main Sep 4, 2024
14 of 15 checks passed
sayakpaul pushed a commit that referenced this pull request Sep 6, 2024
---------

Co-authored-by: yiyixuxu <[email protected]>

Update `UNet2DConditionModel`'s error messages (#9230)

* refactor

[CI] Update Single file Nightly Tests (#9357)

* update

* update

feedback.

improve README for flux dreambooth lora (#9290)

* improve readme

* improve readme

* improve readme

* improve readme

fix one uncaught deprecation warning for accessing vae_latent_channels in VaeImagePreprocessor (#9372)

deprecation warning vae_latent_channels

add mixed int8 tests and more tests to nf4.

[core] Freenoise memory improvements (#9262)

* update

* implement prompt interpolation

* make style

* resnet memory optimizations

* more memory optimizations; todo: refactor

* update

* update animatediff controlnet with latest changes

* refactor chunked inference changes

* remove print statements

* update

* chunk -> split

* remove changes from incorrect conflict resolution

* remove changes from incorrect conflict resolution

* add explanation of SplitInferenceModule

* update docs

* Revert "update docs"

This reverts commit c55a50a.

* update docstring for freenoise split inference

* apply suggestions from review

* add tests

* apply suggestions from review

quantization docs.

docs.
sayakpaul added a commit that referenced this pull request Sep 6, 2024
@yiyixuxu yiyixuxu mentioned this pull request Sep 12, 2024
5 tasks
@ukaprch
Copy link

ukaprch commented Sep 23, 2024

What I can tell you is that as good as Flux is for modest inpainting (filling in a masked region) it is very poor at outpainting (replacing everything but the mask object). Flux needs an inpainting version.

@ssxxx1a
Copy link

ssxxx1a commented Sep 23, 2024

@ssxxx1a try higher denoising strength. Larger than 0.85 works fine, start from 1. to understand if that is the issue.

it will lose the ability of inpainting. as a text2img task in designated area with mask

@ukaprch
Copy link

ukaprch commented Sep 26, 2024 via email

@pandayummy
Copy link

We need an Flux inpanting model like this:
https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1

But it requirs a lot of GPU resources/money to train.

@ukaprch
Copy link

ukaprch commented Sep 30, 2024 via email

sayakpaul added a commit that referenced this pull request Oct 21, 2024
* quantization config.

* fix-copies

* fix

* modules_to_not_convert

* add bitsandbytes utilities.

* make progress.

* fixes

* quality

* up

* up

rotary embedding refactor 2: update comments, fix dtype for use_real=False (#9312)

fix notes and dtype

up

up

* minor

* up

* up

* fix

* provide credits where due.

* make configurations work.

* fixes

* fix

* update_missing_keys

* fix

* fix

* make it work.

* fix

* provide credits to transformers.

* empty commit

* handle to() better.

* tests

* change to bnb from bitsandbytes

* fix tests

fix slow quality tests

SD3 remark

fix

complete int4 tests

add a readme to the test files.

add model cpu offload tests

warning test

* better safeguard.

* change merging status

* courtesy to transformers.

* move  upper.

* better

* make the unused kwargs warning friendlier.

* harmonize changes with huggingface/transformers#33122

* style

* trainin tests

* feedback part i.

* Add Flux inpainting and Flux Img2Img (#9135)

---------

Co-authored-by: yiyixuxu <[email protected]>

Update `UNet2DConditionModel`'s error messages (#9230)

* refactor

[CI] Update Single file Nightly Tests (#9357)

* update

* update

feedback.

improve README for flux dreambooth lora (#9290)

* improve readme

* improve readme

* improve readme

* improve readme

fix one uncaught deprecation warning for accessing vae_latent_channels in VaeImagePreprocessor (#9372)

deprecation warning vae_latent_channels

add mixed int8 tests and more tests to nf4.

[core] Freenoise memory improvements (#9262)

* update

* implement prompt interpolation

* make style

* resnet memory optimizations

* more memory optimizations; todo: refactor

* update

* update animatediff controlnet with latest changes

* refactor chunked inference changes

* remove print statements

* update

* chunk -> split

* remove changes from incorrect conflict resolution

* remove changes from incorrect conflict resolution

* add explanation of SplitInferenceModule

* update docs

* Revert "update docs"

This reverts commit c55a50a.

* update docstring for freenoise split inference

* apply suggestions from review

* add tests

* apply suggestions from review

quantization docs.

docs.

* Revert "Add Flux inpainting and Flux Img2Img (#9135)"

This reverts commit 5799954.

* tests

* don

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

* contribution guide.

* changes

* empty

* fix tests

* harmonize with huggingface/transformers#33546.

* numpy_cosine_distance

* config_dict modification.

* remove if config comment.

* note for load_state_dict changes.

* float8 check.

* quantizer.

* raise an error for non-True low_cpu_mem_usage values when using quant.

* low_cpu_mem_usage shenanigans when using fp32 modules.

* don't re-assign _pre_quantization_type.

* make comments clear.

* remove comments.

* handle mixed types better when moving to cpu.

* add tests to check if we're throwing warning rightly.

* better check.

* fix 8bit test_quality.

* handle dtype more robustly.

* better message when keep_in_fp32_modules.

* handle dtype casting.

* fix dtype checks in pipeline.

* fix warning message.

* Update src/diffusers/models/modeling_utils.py

Co-authored-by: YiYi Xu <[email protected]>

* mitigate the confusing cpu warning

---------

Co-authored-by: Vishnu V Jaddipal <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
@Nomination-NRB
Copy link

Sure!

First

pip3 install git+https://github.com/Gothos/diffusers.git@flux-inpaint

then:

from diffusers import FluxInpaintPipeline
from PIL import Image
import torch

pipe =  FluxInpaintPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)

prompt = "your prompt here"
image = pipe(
    prompt,
    image = Image.open("path/to/image",
    mask_image=Image.open("path/to/mask"),
    strength=0.85, # below 0.85 doesn't seem to cause a lot of change
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image

Just replacing the path to the image and path to the mask, and prompt should work.

Thanks for your code, and how much GPU VRAM consume?

sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
sayakpaul added a commit that referenced this pull request Dec 23, 2024
* quantization config.

* fix-copies

* fix

* modules_to_not_convert

* add bitsandbytes utilities.

* make progress.

* fixes

* quality

* up

* up

rotary embedding refactor 2: update comments, fix dtype for use_real=False (#9312)

fix notes and dtype

up

up

* minor

* up

* up

* fix

* provide credits where due.

* make configurations work.

* fixes

* fix

* update_missing_keys

* fix

* fix

* make it work.

* fix

* provide credits to transformers.

* empty commit

* handle to() better.

* tests

* change to bnb from bitsandbytes

* fix tests

fix slow quality tests

SD3 remark

fix

complete int4 tests

add a readme to the test files.

add model cpu offload tests

warning test

* better safeguard.

* change merging status

* courtesy to transformers.

* move  upper.

* better

* make the unused kwargs warning friendlier.

* harmonize changes with huggingface/transformers#33122

* style

* trainin tests

* feedback part i.

* Add Flux inpainting and Flux Img2Img (#9135)

---------

Co-authored-by: yiyixuxu <[email protected]>

Update `UNet2DConditionModel`'s error messages (#9230)

* refactor

[CI] Update Single file Nightly Tests (#9357)

* update

* update

feedback.

improve README for flux dreambooth lora (#9290)

* improve readme

* improve readme

* improve readme

* improve readme

fix one uncaught deprecation warning for accessing vae_latent_channels in VaeImagePreprocessor (#9372)

deprecation warning vae_latent_channels

add mixed int8 tests and more tests to nf4.

[core] Freenoise memory improvements (#9262)

* update

* implement prompt interpolation

* make style

* resnet memory optimizations

* more memory optimizations; todo: refactor

* update

* update animatediff controlnet with latest changes

* refactor chunked inference changes

* remove print statements

* update

* chunk -> split

* remove changes from incorrect conflict resolution

* remove changes from incorrect conflict resolution

* add explanation of SplitInferenceModule

* update docs

* Revert "update docs"

This reverts commit c55a50a.

* update docstring for freenoise split inference

* apply suggestions from review

* add tests

* apply suggestions from review

quantization docs.

docs.

* Revert "Add Flux inpainting and Flux Img2Img (#9135)"

This reverts commit 5799954.

* tests

* don

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

* contribution guide.

* changes

* empty

* fix tests

* harmonize with huggingface/transformers#33546.

* numpy_cosine_distance

* config_dict modification.

* remove if config comment.

* note for load_state_dict changes.

* float8 check.

* quantizer.

* raise an error for non-True low_cpu_mem_usage values when using quant.

* low_cpu_mem_usage shenanigans when using fp32 modules.

* don't re-assign _pre_quantization_type.

* make comments clear.

* remove comments.

* handle mixed types better when moving to cpu.

* add tests to check if we're throwing warning rightly.

* better check.

* fix 8bit test_quality.

* handle dtype more robustly.

* better message when keep_in_fp32_modules.

* handle dtype casting.

* fix dtype checks in pipeline.

* fix warning message.

* Update src/diffusers/models/modeling_utils.py

Co-authored-by: YiYi Xu <[email protected]>

* mitigate the confusing cpu warning

---------

Co-authored-by: Vishnu V Jaddipal <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.