From sketch to art with iterative img2img (also question: why fixed seed is bad for this?) #2473
aleksusklim
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I used repeating img2img with cherry-picking each image for next iteration manually, to maximize visual appealing. I made several experiments, what values are better at witch level.
In short, on first iterations you don't need much Steps, the CFG can be very low (to better diverse results), and Denoising strength should be around 0.3-0.4 if you don't want to lose your sketch completely.
At last iterations you would increase Steps and Denoising strength (but not more than 0.8, or the composition will be ruined, especially if larger than 512*512; see also #2213 (comment)), while raising CFG and dimensions if you want.
At any moment you can fix the prompt (adding or removing appearing details) and try different sampler. But I don't understand, why you should not use one fixed constant seed?
Because if you will provide a seed (instead of random -1), your images quickly become oversaturated, over-sharpen, over-pixilated!
Here I'll provide a demo of the process of enhancing low-quality sketch to a high-quality artwork with Stable Diffusion.
Firstly, we need a sketch to start. We have txt2img, so I will abuse that method to create a sketch for me, but I will not use the prompt nor seed from it, only the image itself.
(Euler 32 steps, CFG 7, random 20 images).
Since I will make a photo from it, I choose image at Y=1, X=5:
My text for img2img will be:
Sending to img2img with these parameters: Euler 48 steps (it will do less, option for ignoring denoising is unchecked), CFG = 3, Denoising strength = 0.4:
I need to pick from these… So, I choose the one with better rocket texture, at Y=4, X=1:
Now I raise CFG to 7 and Denoising strength to 0.5:
It's getting better… I'll pick the one which look less like drawing, but keeping good shape. At Y=2, X=1:
Raising more – CFG to 12, Denoising strength to 0.6:
This is the point where I'd like to pick more than one! The overall composition is very good at Y=4, X=1; but the rocket shape is best at Y=3, X=2. What if I try to grab the Seed from it, but using the first picture for next step?
↑ Seed | Image ↓
Since I had fixed the seed, I won't be able to generate "random" images. Instead, I will use X/Y-grid script to play with CFG and Denoising.
I raise only Steps count to 64:
Hmm, even denoising 0.7 looks good! I'll pick the one at CFG 15:
And this is where things get worse because of fixed seed. Not changing anything, just re-run with chosen image:
The composition became too contrast and over-sharpen! The effect is strong, visible at any CFG or denoising level. It will not become any better if I'll keep using the same seed at next iterations.
So, changing seed to -1 and checking "Keep -1 for seeds" (recently it was fixed to always randomize all seeds in grid) gives a lot better results:
(Except for rocket loses parts of its wings on previous step, but I didn't notice that in time).
My question is: why we should not use constant seed when doing img2img? Isn't the network should improve the quality rather than over-improving it?
Such contrast looks much like over-raised CFG scale (up to 30), but why then the images become softer with random seed, how it matters?
Aren't we supposed to keep the same seed when upscaling from txt2img with img2img for the same prompt? Or that is totally different from img2img'ing arbitrary images, that not "from" the provided prompt?
P.S. #2474
Beta Was this translation helpful? Give feedback.
All reactions