[Feature Request]: Stable Diffusion x2 latent upscaler #7680

briansemrau · 2023-02-09T16:17:56Z

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Implement https://huggingface.co/stabilityai/sd-x2-latent-upscaler

Allows 2x upscaling in latent space

Proposed workflow

Should be an upscaling option like the other methods provided.

Additional information

No response

ProGamerGov · 2023-02-09T16:20:49Z

To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE. Or you can take any image, encode it into the latent space, use the upscaler, and decode it.

I don't think that it'll work exactly like the existing upscalers. Its almost like an img2img model that takes the latent tensor instead of an image.

Cyberbeing · 2023-02-09T17:52:28Z

Rather than that, it sounds like it's designed to upscale txt2img/img2img output latent prior to VAE decoding. So rather than a post-processing upscaling step, it's being inserted into the middle of a normal SD output workflow.

To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE.

catboxanon · 2023-04-17T10:01:44Z

Don't the included latent upscalers work in a similar vein, upscaling the latent and feeding that into the upscale process? In this case if this were implemented then denoising for that second step wouldn't necessarily be needed.

Edit: Actually the way the pipeline works, it gives you the upscaled image directly. So you could denoise it further but as I mentioned it may not be needed.

catboxanon · 2023-04-17T10:36:31Z

I've implemented this now but the included VAE seems particularly awful for some reason. Maybe I can replace it with the current one in use by the web UI. I'll post some comparisons later.

catboxanon · 2023-04-17T11:12:43Z

but the included VAE seems particularly awful for some reason

I was judging this based on the fact faces turn out bad with it, but turns out that's listed as a limitation.

Faces and people in general may not be generated properly.

After experimenting a bit more it doesn't seem that great compared to other upscalers we have now imo. GAN upscalers still seem superior, and even LDSR, based on diffusion, looks a lot better. Comparison below is using #4446 for Latent Diffusion upscaler. I didn't replace the VAE for the SD x2 upscaler in this comparison but when I did replace it that didn't fix fundamental issues like the face and such.

Frankly I don't have interest to make a PR for this with these results.

briansemrau added the enhancement New feature or request label Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Stable Diffusion x2 latent upscaler #7680

[Feature Request]: Stable Diffusion x2 latent upscaler #7680

briansemrau commented Feb 9, 2023

ProGamerGov commented Feb 9, 2023

Cyberbeing commented Feb 9, 2023 •

edited

Loading

catboxanon commented Apr 17, 2023 •

edited

Loading

catboxanon commented Apr 17, 2023

catboxanon commented Apr 17, 2023

[Feature Request]: Stable Diffusion x2 latent upscaler #7680

[Feature Request]: Stable Diffusion x2 latent upscaler #7680

Comments

briansemrau commented Feb 9, 2023

Is there an existing issue for this?

What would your feature do ?

Proposed workflow

Additional information

ProGamerGov commented Feb 9, 2023

Cyberbeing commented Feb 9, 2023 • edited Loading

catboxanon commented Apr 17, 2023 • edited Loading

catboxanon commented Apr 17, 2023

catboxanon commented Apr 17, 2023

Cyberbeing commented Feb 9, 2023 •

edited

Loading

catboxanon commented Apr 17, 2023 •

edited

Loading