-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I cannot believe this doesn't have more stars and attention !!! FREAKIN AWESOME #3
Comments
Glad you like it. It was just a quick one-off idea I had, and there's definitely improvements that could be made if I had the time. As for using 2.1, Last time I checked 1.5 and 2.X latents were compatible, so technically "v1" in the dropdown is "v1/v2". That means it should work fine with 2.1 models without any changes. |
Nah I had tried before. SD 2.X and 1.5 are incompatible in the latent space :/ EDIT: NVM, you're right. It actually works. I don't know if it's optimal AS is. But very surprising to see that the latent space is compatible with 1.x Damn. |
Just double checked. Stable diffusion v2.1 and v2.0 both come with the same VAE, which is "ft-mse-840000" - the same one people usually use with SDv1.5. This means it is not only compatible, it's 100% the same latent format as far as I can tell. Sure, the model might be more sensitive to the noise the interposer adds, but an improved xl->v1 interposer would also mean improvements to xl->v2. And to reiterate, there's still plenty of room for improvement. I could probably even take the slightly less scuffed architecture from my latent upscaler and apply it here, though I'd like to design something better once I figure out more about how all this neural network stuff works :P 840K VAE commonly used with 1.5 |
@city96 Thanks for your response! If I may ask, how could this be improved by the way ? (The interposer), also, would there be a way to do this with a single .safetensors that could keep some kind of "merge" of a 1.5 model with an SDXL one ? Or would that be absolutely impossible. Thanks in advance for your time |
Well, the neural network part would have to be changed. Currently it's just a bunch of random conv2D layers that look like a spaceship. I think I have an idea on how to make a better one but yeah, time... The other thing that needs changing is the dataset, but I think I got a decent one I can re-use from the upscaler. Which means the only other thing I'd need is, again, time to work on this :P
You mean combining the v1->xl and xl->v1 models into a single file? I mean, that's easy enough to do I guess... You can store multiple models in the same safetensor file just fine. |
How would you do that ? Store multi-models in one single safetensor file ? :o |
Same way SD does it. safetensor files just store pairs of keys:values (in this case the values are the network weights). You can just add a prefix to all the keys so you can grab the ones you need while loading. For example, all of the stable diffusion checkpoint files will have a bunch of keys starting with "first_stage_model" - that's the VAE. Similarly, CLIP and the actual UNET are also stored in the same file just with different prefixes. I'd probably do something like this if I had to put both interposer models in the same file: import torch
from safetensors.torch import load_file, save_file
v1_to_xl = load_file("v1-to-xl_interposer-v1.1.safetensors")
xl_to_v1 = load_file("xl-to-v1_interposer-v1.1.safetensors")
out_dict = {}
for k,v in v1_to_xl.items():
out_dict[f"v1_to_xl.{k}"] = v
for k,v in xl_to_v1.items():
out_dict[f"xl_to_v1.{k}"] = v
save_file(out_dict, "interposer-v1.1.safetensors") List of keys before/afterxl->v1 keys:
v1->xl keys:
combined output keys:
Then you just split off the ones you actually need while loading with a |
@PurpleBlueAloeVera Figured I'd ping you, I re-trained the whole thing with a new architecture. It should work a lot better now for both xl->v1 and v1->xl. It still has some hue/saturation issues but overall it's an improvement. I'd appreciate it if you could re-test using it with SDv2.x models as well, since that was one of the things you said worked sub-par. |
It looks indeed a LOT better here! Well done. And for sure, I'll try this a.s.a.p and get back to you. Btw, no problem, don't hesitate to ping me if you'd like me to test/feedback! I'm loving this thing you brought. :) |
Q: how could this allow people to port SD1.5 LoRAs into SDXL? or is it strictly a Checkpoints thing? |
I guess issue #1 kind of explains how you can do that. That's the only real way you can use v1 LoRAs with xl, but obviously it won't work for concept LoRAs, only character/style ones. |
I'm just getting past the beginner stage of ComfyUI/Stable Diffusion in general and this process is exactly what I'm looking for. I've tried many ways of installing this, files all seem to be in order in directories, I just can't find a workflow. Dropping the .png into ComfyUI doesn't work. I must be missing something very obvious. Any help from anyone to get this to work would be greatly appreciated. |
@Benzene82 You mean the one in the image above? It's just a demo workflow but if you want it then here's the JSON metadata for it. Good thing I never delete anything lol. Feel free to reply if you got any questions. |
Thanks so much for the fast response!
I wasn't looking for *this specific *workflow, just couldn't figure out
any. I can confirm the node works as expected. I'll test out other Models
and LoRAs to learn more about how it works. I'm trying to get away from
LoRAs that basically stamp what they are trained on into a subject, making
copycat images. I thought blending 1.5 LoRAs with SDXL ones might add some
'variety' and possibly more realism. Simply blending the latent images
makes a weird hybrid and I hope this node and process delivers better
results. If you could send a link to the latest workflow, V3, I'd greatly
appreciate it. I'm not familiar with the Github or Hugging Face cloning of
repos, just enough to be dangerous. LOL
…On Tue, Jan 2, 2024 at 6:38 PM City ***@***.***> wrote:
@Benzene82 <https://github.com/Benzene82> You mean the one in the image
above? It's just a demo workflow but if you want it then here's the JSON
metadata for it
<https://github.com/city96/SD-Latent-Interposer/files/13815292/SDXL_T2.json>.
Good thing I never delete anything lol. Feel free to reply if you got any
questions.
—
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARSQ2XCGN4IZP67RN2JIRLDYMTABJAVCNFSM6AAAAAA5WT7QQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUG44DEMJUGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yeah, that can be annoying. I mostly use my own LoRAs now but a lot of the civitai ones are overtrained like that and need a really low weight to even work, if they work at all and aren't completely incompatible lol. I guess you could look into controlnet or do what I tend to do, which is generate your base image with one model (in this case SDXL) and then img2img it (directly pass the latent via this node) on 1.5 at a high enough denoise. 0.5+ is recommended for this IMO. Some more esoteric stuff might work like canny edge/openpose detect the output from SDXL and using that for an input for 1.5.
This node is just meant to replace the need to do a VAE dencode/encode between SDXL/SDv1, though people have used it for some more crazy stuff, like returning the leftover noise from XL and denoising it on v1 with the advanced KSampler. I guess you could convert the SDXL latent with this node and then pipe it into a latent composite/blend node together with the v1 one.
There's no "official" workflow for this repo. I don't really use SDXL anymore (switched to PixArt alpha for the initial image for my new stuff) but here's one of my old workflows for SDXL. It isn't very good but maybe it'll work as a starting point for you? |
@city96 what about object LoRAs? How would you get around with SD1.5 to SDXL?
That is also a concern, if that is the case what would be the strategy of LoRA cleaning with human-in-the-loop? RLHF/PPO or some other alternative that reduces the amount of human judgement on quality? |
Masking/inpainting I guess? Maybe using a similar enough placeholder object for XL?
Not sure what you mean. How well a LoRA works will heavily depend on what model you use it with, so there's no universal "best" weight for a given LoRA. It could work perfectly with the model it was trained on while failing miserably if the model it's applied to is different enough (As an extreme example, run a regular 1.5 LoRA on DPO or TokenCompose and see how well that turns out lol). You also won't immediately see a pattern if it's ovetrained, so it might take a bit to realize it's just spitting out variations of the training images. Detecting this would be pretty hard as you'd need some sort of similarity score over a large batch of samples. If you mean LoRA dataset cleaning, that's out of scope for this repo. |
More like generating and filtering data from a LoRA and further refining them to be more "accurate" by human feedback (random image X is more accurate as "synthetic data" than random image Y). For "smelling" overtraining I am not quite sure if there are ways to make things better (rediscover an optimal weight, human feedback etc.) |
A bit of a side note but X-Adapter might be just as useful https://github.com/showlab/X-Adapter |
Thank you so much for this. Do you plan on adding SD2.1 to the features ?? We have some solid 2.1 models that we'd love to user as refiners for 1.5, or SDXL in our workflows.
Awesome, seriously thank you for sharing this awesome tool !
The text was updated successfully, but these errors were encountered: