-
Notifications
You must be signed in to change notification settings - Fork 936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learn TI Embedding and LoRA both at the same time #635
Comments
Related: https://huggingface.co/blog/dreambooth#epilogue-textual-inversion--dreambooth (last chapter) |
At very least it would be nice to add TI loading in train_network.py such that a TI could be trained first and then a UNet LoRA trained afterwards. |
HCP Diffusion supports this, but I have not yet been able to actually get it to work. I have seen other using it however. I have been thinking about this approach a lot as well, because I don't think the current method is that good. If you just train the text encoder, you can get decent results. If you train both the text encoder and unet, the results are better, but if you try to disable the unet part of it, the results are really poor. This indicates that the text encoder is not fully taken advantage of. I have two big motivations for looking for a better approach. First of all I think better exploiting existing capabilities of the base model will lead to better flexibility of the resulting Lora (you can end up with certain prompts, like a specific pose, that works fine without the Lora become unreliable or completely break with the Lore). However what I really would like to see is better composability with other Lora's and base models. With normal Lora training, the entire text encoder is affected instead of just the trigger tag we are trying to add. When I tried to test how much other tags in the text encoder were affected, I saw numbers around 20-40% compared to the main trigger tag. I haven't messed with drop-out or anything like that, but for completely unrelated tokens to be so affected was quite surprising to me. In order to actually have "trigger words", I do think training both the TI and UNet together will be necessary, in order to create a link between the tag and the UNet doing something different. But pretraining the TI could potentially be useful. But it would be a nice first step. I train using anime screenshots as a base, and I wonder if you could potentially train a base style TI to reduce the influence of the common style of the training images. |
This paper – https://omriavrahami.com/the-chosen-one/ – features training two text inversion embeddings for SDXL along with LoRA simulateniously:
|
I'm reading up on how these models work and I still only have a very superficial understanding, but I noticed this section in the original Lora paper:
https://arxiv.org/abs/2106.09685 Isn't this "prefix-embedding tuning" the same as textual inversion? |
I'll clean up my code and PR it. Doesn't train both at once, but loads TI into the LoRA trainer and works quite well. |
I've been messing with Poiuytrezay1's PR and my experience is the TI overfits on style quite quickly, so you probably want to train them separately anyway. |
I have an idea that I didn't had time to try.
Learning rate at 1. should be high. We don't care if the embedding breaks as-is. |
I just used their other PR which ports cloneofsimo's code to normalize during training #993 A norm of 1 is probably already too high. IIRC the PTI authors found the embedding works best if it's at least somewhat close to other real embeddings. In this case that means initializing with an existing token (init_word) and keeping the norm close to 0.4 |
I thought the normalization during training will compromise its speed. |
Normalizing after training is not going to suddenly un-overfit it |
It will "disable" the embedding, as if it wasn't trained at all. Which is what I want to try for LoRA instead of training CLIP or using a trigger word. |
Is it possible to train a LoRA together with an Embedding? Here are some thoughts that came to this, when training a LoRA for an object:
sks
but not learnphoto
andforest
along?What do you think? Otherwise, I'm not quite sure how to train LoRA on something that is not a character nor a style. For example, to train a LoRA for "scar" concept: what descriptions should we choose?
Should we say "sks over eye, 1boy, …"? If so, isn't it more logical to say directly "scar over eye, 1boy, …"? But if so, how can we be sure that only the concept of "scar" would be changed, and not the concept of "1boy"?
The text was updated successfully, but these errors were encountered: