Attempt to resolve NaN issue with unstable VAEs while utilizing full precision (--no-half-vae
)
#12624
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Update: I believe #12630 fixes this properly -- I will close this PR when that one or another is merged to resolve this.
Description
Attempts to solve a regression in cc53db6 (the previous commit a64fbe8 does not have this issue). I think this is also related to #12611. PR #12599 also still has this issue.
To preface: this only ever seems to happen with
animevae.pt
, and only for certain prompts. As such, it's difficult to find an easily reproducible scenario. This one is consistent for me, and I've verified it also works on somebody else's system. Also, this is absolutely not the correct way to fix this, because now it wastes time trying to decode the latent potentially twice, but I'm trying to wrap my head around what's going wrong here, and hopefully opening this PR brings that to discussion.How to reproduce
Checkout cc53db6 or later, launch with
--no-half-vae
Get this VAE, model, and these LoRAs
VAE: https://huggingface.co/a1079602570/animefull-final-pruned/blob/main/animevae.pt
Model: https://huggingface.co/AnonymousM/Based-mixes/blob/main/Based64mix-V3.safetensors
LoRAs:
(removed)Download and use the metadata from this image to set up the params
(optionally) verify the image can be generated without hires fix.
Attempt to generate the image with hires fix enabled. These are the settings I usually use but I tested this several other times and the only factor that seems to matter is the
Upscale by
value must be1.15
or more.(optionally) take the above image, and use the exact same parameters to upscale in the img2img tab. This will not produce the NaNs exception.
Side note: I got into the weeds and did some debugging, and this is why I'm also suspicious if this is related to the issue I linked above. This is the part of the code that produces the NaNs: https://github.com/Stability-AI/stablediffusion/blob/cf1d67a6fd5ea1aa600c4df58e5b47da45f6bdbf/ldm/modules/diffusionmodules/model.py#L634-L641
If I attempt to upscale the image in img2img, and use the initial value of
z
(the upscaled latent, before this line is executed), store that latent, and then attempt to use hires fix in txt2img, but with that value ofz
I stored earlier, it still produces NaNs in that function. However, if I do it the other way around, storing the upscaled latent from txt2img, and using that in img2img, I instead get the below:Just noting my specs here as well, incase this is somehow some pytorch bug.
Checklist: