Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Stable Diffusion x2 latent upscaler #7680

Open
1 task done
briansemrau opened this issue Feb 9, 2023 · 5 comments
Open
1 task done

[Feature Request]: Stable Diffusion x2 latent upscaler #7680

briansemrau opened this issue Feb 9, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@briansemrau
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Implement https://huggingface.co/stabilityai/sd-x2-latent-upscaler

Allows 2x upscaling in latent space

Proposed workflow

Should be an upscaling option like the other methods provided.

Additional information

No response

@briansemrau briansemrau added the enhancement New feature or request label Feb 9, 2023
@ProGamerGov
Copy link
Contributor

To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE. Or you can take any image, encode it into the latent space, use the upscaler, and decode it.

I don't think that it'll work exactly like the existing upscalers. Its almost like an img2img model that takes the latent tensor instead of an image.

@Cyberbeing
Copy link
Contributor

Cyberbeing commented Feb 9, 2023

Rather than that, it sounds like it's designed to upscale txt2img/img2img output latent prior to VAE decoding. So rather than a post-processing upscaling step, it's being inserted into the middle of a normal SD output workflow.

To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE.
workflow

@catboxanon
Copy link
Collaborator

catboxanon commented Apr 17, 2023

Don't the included latent upscalers work in a similar vein, upscaling the latent and feeding that into the upscale process? In this case if this were implemented then denoising for that second step wouldn't necessarily be needed.

Edit: Actually the way the pipeline works, it gives you the upscaled image directly. So you could denoise it further but as I mentioned it may not be needed.

@catboxanon
Copy link
Collaborator

I've implemented this now but the included VAE seems particularly awful for some reason. Maybe I can replace it with the current one in use by the web UI. I'll post some comparisons later.

@catboxanon
Copy link
Collaborator

but the included VAE seems particularly awful for some reason

I was judging this based on the fact faces turn out bad with it, but turns out that's listed as a limitation.

Faces and people in general may not be generated properly.


After experimenting a bit more it doesn't seem that great compared to other upscalers we have now imo. GAN upscalers still seem superior, and even LDSR, based on diffusion, looks a lot better. Comparison below is using #4446 for Latent Diffusion upscaler. I didn't replace the VAE for the SD x2 upscaler in this comparison but when I did replace it that didn't fix fundamental issues like the face and such.

xyz_grid-0001-2870305590

Frankly I don't have interest to make a PR for this with these results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants