Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a ControlNet model & pipeline #2407

Merged
merged 135 commits into from
Mar 2, 2023
Merged
Show file tree
Hide file tree
Changes from 122 commits
Commits
Show all changes
135 commits
Select commit Hold shift + click to select a range
6123837
add scaffold
takuma104 Feb 13, 2023
d382f93
Add support to load ControlNet (WIP)
takuma104 Feb 13, 2023
04a514a
Update to convert ControlNet without error msg
takuma104 Feb 13, 2023
1f4b706
cleanup of commented out
takuma104 Feb 13, 2023
25eb4e7
split create_controlnet_diffusers_config()
takuma104 Feb 14, 2023
a7cb5a2
Add input_hint_block, input_zero_conv and
takuma104 Feb 14, 2023
148b46d
add unet_2d_blocks_controlnet.py
takuma104 Feb 14, 2023
584edfd
Add loading for input_hint_block, zero_convs
takuma104 Feb 15, 2023
0f9781c
Copy from UNet2DConditionalModel except __init__
takuma104 Feb 15, 2023
0327e73
Add ultra primitive test for ControlNetModel
takuma104 Feb 15, 2023
e5cabdf
Support ControlNetModel inference
takuma104 Feb 15, 2023
1fc01a3
copy forward() from UNet2DConditionModel
takuma104 Feb 15, 2023
cb7bb9a
Impl ControlledUNet2DConditionModel inference
takuma104 Feb 15, 2023
87ed105
Frozen weight & biases for training
takuma104 Feb 15, 2023
efccecc
Minimized version of ControlNet/ControlledUnet
takuma104 Feb 16, 2023
a838366
make style
takuma104 Feb 16, 2023
a296de9
Add support model loading for minimized ver
takuma104 Feb 16, 2023
bd51c6d
Remove all previous version files
takuma104 Feb 16, 2023
7656925
from_pretrained and inference test passed
takuma104 Feb 16, 2023
839e009
copied from pipeline_stable_diffusion.py
takuma104 Feb 16, 2023
cf16a43
Impl pipeline, pixel match test (almost) passed.
takuma104 Feb 17, 2023
ce0e571
Merge branch 'main' into controlnet
takuma104 Feb 17, 2023
9cc8b99
make style
takuma104 Feb 17, 2023
7dbbe22
make fix-copies
takuma104 Feb 17, 2023
a316d86
Fix to add import ControlNet blocks
takuma104 Feb 17, 2023
b17fd20
Remove einops dependency
takuma104 Feb 18, 2023
894bd84
Support np.ndarray, PIL.Image for controlnet_hint
takuma104 Feb 18, 2023
3d3a02f
set default config file as lllyasviel's
takuma104 Feb 18, 2023
38bf48d
Add support grayscale (hw) numpy array
takuma104 Feb 18, 2023
cc597a1
Add and update docstrings
takuma104 Feb 18, 2023
4bcc159
add control_net.mdx
takuma104 Feb 18, 2023
33841b6
add control_net.mdx to toctree
takuma104 Feb 18, 2023
9a37409
Update copyright year
takuma104 Feb 19, 2023
0a1bb45
Fix to add PIL.Image RGB->BGR conversion
takuma104 Feb 19, 2023
90d05e9
make fix-copies
takuma104 Feb 19, 2023
189f46f
Merge branch 'huggingface:main' into controlnet
takuma104 Feb 20, 2023
3ade8c0
add basic fast test for controlnet
takuma104 Feb 20, 2023
d5965c7
add slow test for controlnet/unet
takuma104 Feb 20, 2023
79c0ecb
Ignore down/up_block len check on ControlNet
takuma104 Feb 20, 2023
04f9b8a
add a copy from test_stable_diffusion.py
takuma104 Feb 21, 2023
fe82f10
Accept controlnet_hint is None
takuma104 Feb 21, 2023
1c7d311
merge pipeline_stable_diffusion.py diff
takuma104 Feb 21, 2023
e492e9d
Update class name to SDControlNetPipeline
takuma104 Feb 21, 2023
2eab486
make style
takuma104 Feb 21, 2023
faf1cfb
Baseline fast test almost passed (w long desc)
takuma104 Feb 21, 2023
f656952
Add note comment related vae_scale_factor
takuma104 Feb 22, 2023
6300a52
add test_stable_diffusion_controlnet_ddim
takuma104 Feb 23, 2023
bac69f1
add assertion for vae_scale_factor != 8
takuma104 Feb 23, 2023
4f394a8
slow test of pipeline almost passed
takuma104 Feb 23, 2023
2b0f04b
test_stable_diffusion_long_prompt passed
takuma104 Feb 23, 2023
c6c7312
test_stable_diffusion_no_safety_checker passed
takuma104 Feb 23, 2023
bd5d7b7
remove PoC test files
takuma104 Feb 23, 2023
2c0d4d4
Merge branch 'main' into will/controlnet
williamberman Feb 23, 2023
808376c
fix num_of_image, prompt length issue add add test
takuma104 Feb 24, 2023
cd85086
add support List[PIL.Image] for controlnet_hint
takuma104 Feb 24, 2023
e376edb
wip
williamberman Feb 23, 2023
19be7e6
Merge remote-tracking branch 'will_diffusers/will/controlnet' into co…
takuma104 Feb 24, 2023
b74ef10
all slow test passed
takuma104 Feb 24, 2023
b8e689e
make style
takuma104 Feb 24, 2023
2d8cca1
update for slow test
takuma104 Feb 24, 2023
0f70cf5
RGB(PIL)->BGR(ctrlnet) conversion
takuma104 Feb 24, 2023
91623a9
fixes
williamberman Feb 24, 2023
855580d
remove manual num_images_per_prompt test
williamberman Feb 25, 2023
e758682
Merge branch 'main' into controlnet
williamberman Feb 25, 2023
788e03d
add document
takuma104 Feb 25, 2023
42ebc45
add `image` argument docstring
takuma104 Feb 25, 2023
8bb964d
make style
takuma104 Feb 25, 2023
49024f6
Add line to correct conversion
takuma104 Feb 25, 2023
30f7570
add controlnet_conditioning_scale (aka control_scales
williamberman Feb 25, 2023
e169f32
rgb channel ordering by default
williamberman Feb 25, 2023
e1b8b49
image batching logic
williamberman Feb 25, 2023
2953f9f
Merge branch 'main' into controlnet
williamberman Feb 26, 2023
f5cd24a
Add control image descriptions for each checkpoint
takuma104 Feb 27, 2023
1a02798
Merge branch 'main' into controlnet
williamberman Feb 28, 2023
8f01ca1
Only save controlnet model in conversion script
williamberman Feb 28, 2023
ca4378e
Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffu…
takuma104 Feb 28, 2023
1799d83
Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
takuma104 Feb 28, 2023
d7b95cf
Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
takuma104 Feb 28, 2023
2e86e1f
Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
takuma104 Feb 28, 2023
bb03069
Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
takuma104 Feb 28, 2023
9a14567
Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
takuma104 Feb 28, 2023
71d0a96
Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
takuma104 Feb 28, 2023
16efb00
Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
takuma104 Feb 28, 2023
1b0af7d
Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
takuma104 Feb 28, 2023
53f4523
Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
takuma104 Feb 28, 2023
161aac2
add gerated image example
takuma104 Feb 28, 2023
349f3bf
a depth mask -> a depth map
takuma104 Feb 28, 2023
3f6e8f7
rename control_net.mdx to controlnet.mdx
takuma104 Feb 28, 2023
ebabcbe
fix toc title
takuma104 Feb 28, 2023
30e2bde
add ControlNet abstruct and link
takuma104 Feb 28, 2023
a00c9ca
Merge branch 'main' into controlnet
takuma104 Feb 28, 2023
f099be5
Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffu…
takuma104 Mar 1, 2023
2b553d9
remove controlnet constructor arguments re: @patrickvonplaten
williamberman Mar 1, 2023
238e26f
Merge branch 'main' into controlnet
williamberman Mar 1, 2023
d49296c
[integration tests] test canny
williamberman Mar 1, 2023
d1cd65a
test_canny fixes
williamberman Mar 1, 2023
7eb43f1
[integration tests] test_depth
williamberman Mar 1, 2023
032d5e0
[integration tests] test_hed
williamberman Mar 1, 2023
5c7dbb3
[integration tests] test_mlsd
williamberman Mar 1, 2023
cdbc7c4
add channel order config to controlnet
williamberman Mar 1, 2023
86c1684
[integration tests] test normal
williamberman Mar 1, 2023
a18fc70
[integration tests] test_openpose test_scribble
williamberman Mar 1, 2023
9ec6ad4
change height and width to default to conditioning image
williamberman Mar 1, 2023
8fd8e42
[integration tests] test seg
williamberman Mar 1, 2023
7c35fc7
style
williamberman Mar 1, 2023
e200797
test_depth fix
williamberman Mar 1, 2023
e6973eb
[integration tests] size fixes
williamberman Mar 1, 2023
0ba19da
[integration tests] cpu offloading
williamberman Mar 1, 2023
1a803d1
style
williamberman Mar 1, 2023
8dea9c7
generalize controlnet embedding
williamberman Mar 1, 2023
60e3635
fix conversion script
williamberman Mar 1, 2023
b15fca9
Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
takuma104 Mar 1, 2023
d7ed0b1
Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
takuma104 Mar 1, 2023
0ed0581
Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
takuma104 Mar 1, 2023
5e16d13
Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
takuma104 Mar 1, 2023
06bb1db
Style adapted to the documentation of pix2pix
takuma104 Mar 1, 2023
acf8d26
Merge branch 'huggingface:main' into controlnet
takuma104 Mar 1, 2023
3981459
merge main by hand
takuma104 Mar 1, 2023
b799512
Merge branch 'main' into controlnet
williamberman Mar 2, 2023
0810e4c
style
williamberman Mar 2, 2023
10dedd9
[docs] controlling generation doc nits
williamberman Mar 2, 2023
042c75e
correct some things
patrickvonplaten Mar 2, 2023
ff2e691
add: controlnetmodel to autodoc.
sayakpaul Mar 2, 2023
9cb8816
finish docs
patrickvonplaten Mar 2, 2023
8f62631
Merge branch 'controlnet' of https://github.com/takuma104/diffusers i…
patrickvonplaten Mar 2, 2023
3bbc356
finish
patrickvonplaten Mar 2, 2023
b8d1908
finish 2
patrickvonplaten Mar 2, 2023
1f36d9e
correct images
patrickvonplaten Mar 2, 2023
8052dde
Merge branch 'controlnet' of https://github.com/takuma104/diffusers i…
patrickvonplaten Mar 2, 2023
a610e47
finish controlnet
patrickvonplaten Mar 2, 2023
9947fcb
Apply suggestions from code review
patrickvonplaten Mar 2, 2023
592f389
uP
patrickvonplaten Mar 2, 2023
ec4fc3a
upload model
patrickvonplaten Mar 2, 2023
b010e3c
up
patrickvonplaten Mar 2, 2023
547ba02
up
patrickvonplaten Mar 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,8 @@
title: Self-Attention Guidance
- local: api/pipelines/stable_diffusion/panorama
title: MultiDiffusion Panorama
- local: api/pipelines/stable_diffusion/controlnet
title: Text-to-Image Generation with ControlNet Conditioning
title: Stable Diffusion
- local: api/pipelines/stable_diffusion_2
title: Stable Diffusion 2
Expand Down
189 changes: 189 additions & 0 deletions docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Text-to-Image Generation with ControlNet Conditioning

## Overview

[Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) by Lvmin Zhang and Maneesh Agrawala.

Using the pretrained models we can provide control images (for example, a depth map) to control Stable Diffusion text-to-image generation so that it follows the structure of the depth image and fills in the details.

The abstract of the paper is the following:

*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.*

This model was contributed by the amazing community contributor [takuma104](https://huggingface.co/takuma104) ❤️ .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takuma104 - left a comment here for future readers for credit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten Thank you so much! It is a great honor!


Resources:

* [Paper](https://arxiv.org/abs/2302.05543)
* [Original Code](https://github.com/lllyasviel/ControlNet)

## Available Pipelines:

| Pipeline | Tasks | Demo
|---|---|:---:|
| [StableDiffusionControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py) | *Text-to-Image Generation with ControlNet Conditioning* | [Colab Example](https://colab.research.google.com/drive/1AiR7Q-sBqO88NCyswpfiuwXZc7DfMyKA?usp=sharing) |

## Usage example

In the following we give a simple example of how to use a *ControlNet* checkpoint with Diffusers for inference.
The inference pipeline is the same for all pipelines:

* 1. Take an image and run it through a pre-conditioning processor.
* 2. Run the pre-processed image through the [`StableDiffusionControlNetPipeline`].

Let's have a look at a simple example using the [Canny Edge ControlNet](https://huggingface.co/fusing/sd-controlnet-canny).

```python
from diffusers import StableDiffusionControlNetPipeline
from diffusers.utils import load_image

# Let's load the popular vermeer image
image = load_image(
"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
)
```

![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png)

Next, we process the image to get the canny image. This is step *1.* - running the pre-conditioning processor. The pre-conditioning processor is different for every ControlNet. Please see the model cards of the [official checkpoints](#controlnet-with-stable-diffusion-1.5) for more information about other models.

First, we need to install opencv:

```
pip install opencv-contrib-python
```

Then we can retrieve the canny edges of the image.

```python
import cv2
from PIL import Image
import numpy as np

image = np.array(image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
```

Let's take a look at the processed image.

![img](https://huggingface.co/datasets/huggingface/documentation-images/blob/main/diffusers/vermeer_canny_edged.png)

Now, we load the official [Stable Diffusion 1.5 Model](runwayml/stable-diffusion-v1-5) as well as the ControlNet for canny edges.

```py
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch

controlnet = ControlNetModel.from_pretrained("fusing/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)
```

To speed-up things and reduce memory, let's enable model offloading and use the fast [`UniPCMultistepScheduler`].

```py
from diffusers import UniPCMultistepScheduler

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

# this command loads the individual model components on GPU on-demand.
pipe.enable_model_cpu_offload()
```

Finally, we can run the pipeline:

```py
generator = torch.manual_seed(0)

out_image = pipe("colorful painting of woman", num_inference_steps=20, generator=generator).images[0]
```

This should take only around 3-4 seconds on GPU (depending on hardware). The output image then looks as follows:


The conditioning image is an outline of the image edges, as detected by a Canny filter. This is the example we'll use to control the generation

![White on black edges detected on Vermeer's Girl with a Pearl Earring portrait](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/vermeer_canny_edged.png)

In the following example, note that the text prompt does not make any reference to the structure or contents of the image we are generating. Stable Diffusion interprets the control image as an additional input that controls what to generate.

```python
from diffusers import StableDiffusionControlNetPipeline
from diffusers.utils import load_image

# Canny edged image for control
canny_edged_image = load_image(
"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/vermeer_canny_edged.png"
)

pipe = StableDiffusionControlNetPipeline.from_pretrained("takuma104/control_sd15_canny").to("cuda")
image = pipe(prompt="best quality, extremely detailed", image=canny_edged_image).images[0]
image.save("generated.png")
```

- Controlling custom Stable Diffusion 1.5 models
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you meant it as a heading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 06bb1db


In the following example we use PromptHero's [Openjourney model](https://huggingface.co/prompthero/openjourney), which was fine-tuned from the base Stable Diffusion v1.5 model on images from Midjourney. This model has the same structure as Stable Diffusion 1.5 but is capable of producing outputs in a different style.

```py
from diffusers import StableDiffusionControlNetPipeline, AutoencoderKL, UNet2DConditionModel
from diffusers.utils import load_image

# Canny edged image for control
canny_edged_image = load_image(
"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/vermeer_canny_edged.png"
)

base_model_id = "prompthero/openjourney" # an example: openjourney model
vae = AutoencoderKL.from_pretrained(base_model_id, subfolder="vae").to("cuda")
unet = UNet2DConditionModel.from_pretrained(base_model_id, subfolder="unet").to("cuda")

pipe = StableDiffusionControlNetPipeline.from_pretrained("takuma104/control_sd15_canny", unet=unet, vae=vae).to("cuda")
image = pipe(prompt="best quality, extremely detailed", image=canny_edged_image, width=512, height=512).images[0]
image.save("generated.png")
```

<!-- TODO: add space -->
## Available checkpoints

ControlNet requires a *control image* in addition to the text-to-image *prompt*.
Each pretrained model is trained using a different conditioning method that requires different images for conditioning the generated outputs. For example, Canny edge conditioning requires the control image to be the output of a Canny filter, while depth conditioning requires the control image to be a depth map. See the overview and image examples below to know more.

All checkpoints can be found under the authors' namespace [lllyasviel](https://huggingface.co/lllyasviel).

### ControlNet with Stable Diffusion 1.5

| Model Name | Control Image Overview| Control Image Example | Generated Image Example |
|---|---|---|---|
|[fusing/sd-controlnet-canny](https://huggingface.co/fusing/sd-controlnet-canny)<br/> *Trained with canny edge detection* | A monochrome image with white edges on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_bird_canny.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_bird_canny.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_bird_canny_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_bird_canny_1.png"/></a>|
|[fusing/sd-controlnet-depth](https://huggingface.co/fusing/sd-controlnet-depth)<br/> *Trained with Midas depth estimation* |A grayscale image with black representing deep areas and white representing shallow areas.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_vermeer_depth.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_vermeer_depth.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_depth_2.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_depth_2.png"/></a>|
|[fusing/sd-controlnet-hed](https://huggingface.co/fusing/sd-controlnet-hed)<br/> *Trained with HED edge detection (soft edge)* |A monochrome image with white soft edges on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_bird_hed.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_bird_hed.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_bird_hed_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_bird_hed_1.png"/></a> |
|[fusing/sd-controlnet-mlsd](https://huggingface.co/fusing/sd-controlnet-mlsd)<br/> *Trained with M-LSD line detection* |A monochrome image composed only of white straight lines on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_room_mlsd.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_room_mlsd.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_mlsd_0.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_mlsd_0.png"/></a>|
|[fusing/sd-controlnet-normal](https://huggingface.co/fusing/sd-controlnet-normal)<br/> *Trained with normal map* |A [normal mapped](https://en.wikipedia.org/wiki/Normal_mapping) image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_human_normal.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_human_normal.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_human_normal_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_human_normal_1.png"/></a>|
|[fusing/sd-controlnet_openpose](https://huggingface.co/fusing/sd-controlnet_openpose)<br/> *Trained with OpenPose bone image* |A [OpenPose bone](https://github.com/CMU-Perceptual-Computing-Lab/openpose) image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_human_openpose.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_human_openpose.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_human_openpose_0.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_human_openpose_0.png"/></a>|
|[fusing/sd-controlnet_scribble](https://huggingface.co/fusing/sd-controlnet_scribble)<br/> *Trained with human scribbles* |A hand-drawn monochrome image with white outlines on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_vermeer_scribble.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_vermeer_scribble.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_scribble_0.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_scribble_0.png"/></a> |
|[fusing/sd-controlnet_seg](https://huggingface.co/fusing/sd-controlnet_seg)<br/>*Trained with semantic segmentation* |An [ADE20K](https://groups.csail.mit.edu/vision/datasets/ADE20K/)'s segmentation protocol image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_room_seg.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_room_seg.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"/></a> |

[[autodoc]] StableDiffusionControlNetPipeline
- all
- __call__
12 changes: 12 additions & 0 deletions docs/source/en/using-diffusers/controlling_generation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ Unless otherwise mentioned, these are techniques that work with existing models
7. [MultiDiffusion Panorama](#multidiffusion-panorama)
8. [DreamBooth](#dreambooth)
9. [Textual Inversion](#textual-inversion)
10. [ControlNet](#controlnet)

## Instruct Pix2Pix

Expand Down Expand Up @@ -146,3 +147,14 @@ See [here](../training/dreambooth) for more information on how to use it.
[Textual Inversion](../training/text_inversion) fine-tunes a model to teach it about a new concept. I.e. a few pictures of a style of artwork can be used to generate images in that style.

See [here](../training/text_inversion) for more information on how to use it.

## ControlNet

[Paper](https://arxiv.org/abs/2302.05543)

[ControlNet](../api/pipelines/stable_diffusion/controlnet) is an auxiliary network which adds an extra condition.
There are 8 canonical pre-trained ControlNets trained on different conditionings such as edge detection, scribbles,
depth maps, and semantic segmentations.

See [here](../api/pipelines/stable_diffusion/controlnet) for more information on how to use it.

11 changes: 10 additions & 1 deletion scripts/convert_original_stable_diffusion_to_diffusers.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,9 @@
help="Path to the clip stats file. Only required if the stable unclip model's config specifies `model.params.noise_aug_config.params.clip_stats_path`.",
required=False,
)
parser.add_argument(
"--controlnet", action="store_true", default=None, help="Set flag if this is a controlnet checkpoint."
)
args = parser.parse_args()

pipe = load_pipeline_from_original_stable_diffusion_ckpt(
Expand All @@ -137,5 +140,11 @@
stable_unclip=args.stable_unclip,
stable_unclip_prior=args.stable_unclip_prior,
clip_stats_path=args.clip_stats_path,
controlnet=args.controlnet,
)
pipe.save_pretrained(args.dump_path, safe_serialization=args.to_safetensors)

if args.controlnet:
# only save the controlnet model
pipe.controlnet.save_pretrained(args.dump_path, safe_serialization=args.to_safetensors)
else:
pipe.save_pretrained(args.dump_path, safe_serialization=args.to_safetensors)
2 changes: 2 additions & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
else:
from .models import (
AutoencoderKL,
ControlNetModel,
ModelMixin,
PriorTransformer,
Transformer2DModel,
Expand Down Expand Up @@ -113,6 +114,7 @@
PaintByExamplePipeline,
SemanticStableDiffusionPipeline,
StableDiffusionAttendAndExcitePipeline,
StableDiffusionControlNetPipeline,
StableDiffusionDepth2ImgPipeline,
StableDiffusionImageVariationPipeline,
StableDiffusionImg2ImgPipeline,
Expand Down
1 change: 1 addition & 0 deletions src/diffusers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

if is_torch_available():
from .autoencoder_kl import AutoencoderKL
from .controlnet import ControlNetModel
from .dual_transformer_2d import DualTransformer2DModel
from .modeling_utils import ModelMixin
from .prior_transformer import PriorTransformer
Expand Down
Loading