Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDIM Sampler on Stable-Diffusion do not work well with CFG guidance scale large than 6~7 #1602

Closed
Randolph-zeng opened this issue Dec 8, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@Randolph-zeng
Copy link
Contributor

Randolph-zeng commented Dec 8, 2022

Describe the bug

I have noticed that the SD v1.4 or v1.5 works poorly if I swap the scheduler to DDIM and have guidance scale larger than 7.
This behavior does not seems to be obvious on other sampler such as the default PNDM scheduler though. At first I suspect it is related to the "train-test-mismatch" mentioned in Imagen paper Sec 2.3. However, I found that that the same DDIM sampler in WebUI does not suffer from the same performance degradation with the same guidance scale. I have manually traced the scales of the predicted epsilon value under the same prompt/guidance scale in both WebUI DDIM sampler and diffuser DDIM sampler, they are all within simialr range of [-4, 4]. I have also printed out the beta schedules of both but they are very very close( of course I tried replacing diffusers DDIM betas with WebUI's DDIM betas but it does not help )
Screenshot 2022-12-08 at 15 08 04

Reproduction

Reproduce is easy, and the behavior of following code snippet is consistent across diffusers versions from 0.3 to 0.9

from diffusers import StableDiffusionPipeline, DDIMScheduler
# swap any SD model here
pipe = StableDiffusionPipeline.from_pretrained('/xxxx/stable-diffusion-v1-5').to(0)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
prompt = "A photograph of an astronaut riding a horse on Mars, high resolution, high definition."
# change the guidance scale from 1 - 15 and observe the performance degradation
image = pipe(prompt=prompt, num_inference_steps=100, guidance_scale=10., strength=1.).images[0]

guidance_scale = 2.5
Screenshot 2022-12-08 at 15 19 25

guidance_scale = 5.0
Screenshot 2022-12-08 at 15 20 10

guidance_scale = 7.5
Screenshot 2022-12-08 at 15 22 16

guidance_scale = 10.
Screenshot 2022-12-08 at 15 21 10

Logs

No response

System Info

Both diffusers == 0.3.0 and diffusers == 0.9.0 suffer from such issue
Also SD v1-4 and v1-5 suffer from such issue

@anton-l
Copy link
Member

anton-l commented Dec 8, 2022

cc @patrickvonplaten @patil-suraj this could be related to the missing dynamic thresholding

@Randolph-zeng
Copy link
Contributor Author

Randolph-zeng commented Dec 9, 2022

@anton-l @patrickvonplaten @patil-suraj Hi thanks for the response! However I don't think this is related to the missing dynamic thresholding(other samplers work just fine for same CFG scale ). I did the same thing using the same model, same prompt, same sampler in webUI(where DDIM functions normally there). I used a debugger to trace through the code and I believe there is no dynamic thresholding there. In fact they are using the sampler implemented in the runwayml/stable-diffusion repo. I suspect it is the formula but I don't see any inconsistence and beta calculation is very close.

@patrickvonplaten
Copy link
Contributor

Hey @Randolph-zeng,

I could not reproduce the bug you've shown in your code example above.
The following code snippet works well for me and gives subjectively good results:

from diffusers import StableDiffusionPipeline, DDIMScheduler
import torch

# swap any SD model here
pipe = StableDiffusionPipeline.from_pretrained('runwayml/stable-diffusion-v1-5').to(0)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
prompt = "A photograph of an astronaut riding a horse on Mars, high resolution, high definition."

generator = torch.Generator(device="cuda").manual_seed(33)

# change the guidance scale from 1 - 15 and observe the performance degradation
image = pipe(prompt=prompt, num_inference_steps=100, guidance_scale=7.5, generator=generator).images[0]
image.save("/home/patrick_huggingface_co/images/aa.png")

E.g.:
aa (15)

A reason for why your scheduler might now have been working correctly could have been that you were using the fp16 branch which previously had incorrectly set clip_sample. This has now been corrected however: https://huggingface.co/runwayml/stable-diffusion-v1-5/commit/ded79e214aa69e42c24d3f5ac14b76d568679cc2

We will also make sure to update the conversion script accordingly!

@patrickvonplaten
Copy link
Contributor

Here the PR that updates the conversion script: #1667

@Randolph-zeng
Copy link
Contributor Author

@patrickvonplaten OMG!! Yes this is exactly the reason why it is failing !!! Thanks a lot for finding it out. This bug really tortures me for a week ! I checked everything but just did not check the clipping and that's why the DDIM is failing because its normal range is [-4,4] and clipping makes all the guidance signal go away!
Thanks a lot for finding it out, I will close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants