-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
propose an Contrastive Prompt Tuning method (DreamArtist), can super dramatically improve the image quality and diversity #2945
Conversation
is this for textual inversion? |
yes, the textual inversion is actually the prompt tuning in NLP. |
Add prompt embedding learning of negative words to dramatically improve the quality of generated images. The concept of high quality can be learned from a single image. Add reconstruction loss to improve the detail quality and richness of the generated images. Adding a discriminator trained by artificial annotation (implemented using convnext) allows embedding to learn based on the model. |
thanks for explanation |
Thanks for the PR! Looks interesting. Could you explain in more detail?
Is this specific to each embedding/tuning or a general improvement? |
Do you have source link for meta files? Is it from this? |
it work for all prompt, once add {name} to prompt and {name}-uc to negative it will improve quality. |
yes, I copy convnext codes from that, and add an interface |
Great work! The final word is Automatic's but as far as I see, there's some changes needed for merge: |
The reconstruction loss is actually decoding the latent features to get the image and calculate the pixel-level loss between images. This reconstruction loss is a popular approach in some deep learning algorithms. |
…xt requirements to requirements.txt
Thanks for your suggestion, I have make the conv_next as repositories in launch.py and put the requirements into requirements.txt |
Having the latest commit, after executing a second time I get:
I only have the models_cnx folder, that's why is failing. Add a check to avoid fail. I tried running it but I don't have enough VRAM. With 8GB seems impossible to run even using --medvram and convnext_tiny_1k_224_ema.pth for classifier model. |
rename bug is fixed. |
I've been testing it for a little while on the latest pull and it does seem pretty promising so far. So far up to about 3000+ steps with 1 image on a RTX 3080 12GB and it does seem to be working well from what I can tell. I'll do some more testing and report my findings. |
Actrually, I get above image with just 1000 steps training. This tag with commonly used negative words can generate amazing images. Mix embedding is also work well. |
@flesnuk The speed can probably be increased by fusing ConvNext with https://github.com/NVIDIA/TransformerEngine (if and when it's supported in Windows) and TorchDynamo. I'll take a look if Automatic merges. |
I could have probably been overtraining a bit. Also struggling to train with a 12GB card with even 3 images, keep running out of memory. It's pretty borderline for this type of training, sometimes it will go for a few hundred steps and then OOM. EDIT: But when it works, it works really well. I've never had much luck with training embeddings but this one seems to actually work for me and capture style/details really well. |
|
||
import os.path | ||
import numpy as np | ||
import cv2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cv2 is not used.
@@ -23,3 +23,7 @@ resize-right | |||
torchdiffeq | |||
kornia | |||
lark | |||
scikit_learn | |||
requests | |||
opencv-python |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm like, 95% sure opencv is used somewhere else in the project...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, eh, yes. For some reason it's not a requirement.
I think it's installed as a dep when the venv is set up, so it's not
especially needed, but it still can't hurt to have it. Just don't know if
adding it should be done in this commit, or a more general "housekeeping"
commit.
…On Mon, Oct 17, 2022 at 11:12 AM RcINS ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In requirements.txt
<#2945 (comment)>
:
> @@ -23,3 +23,7 @@ resize-right
torchdiffeq
kornia
lark
+scikit_learn
+requests
+opencv-python
I checked, eh, yes. For some reason it's not a requirement.
—
Reply to this email directly, view it on GitHub
<#2945 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMO4NGFGPMDL6AI3IEWDKDWDV3E7ANCNFSM6AAAAAARG5ELXU>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
|
@7eu7d7 - So, I'm running on an 8GB GPU, and with this version of TI using the --medvram flag, if I enable negative prompting, I get an OOM on the first training step. With the current master branch and --medvram, this does not happen. I guess, not a dealbreaker, but maybe something to consider for those of us who still need to upgrade their GPU. :D |
@d8ahazard Do you mean that you OOM when using the regular Textual Inversion? |
With "regular" from Automatic's repo, I do not.
With the new changes in the PR, I do. Both using the --medvram flag.
…On Mon, Oct 17, 2022 at 12:11 PM C43H66N12O12S2 ***@***.***> wrote:
@d8ahazard <https://github.com/d8ahazard> Do you mean that you OOM when
using the regular Textual Inversion?
—
Reply to this email directly, view it on GitHub
<#2945 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMO4NC2C5SNQEKURGMDJE3WDWCDVANCNFSM6AAAAAARG5ELXU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@d8ahazard I have this problem when training negative prompts as well on 10GB of VRAM. Turning off "train with reconstruction" stops the OOM errors, but I can't get training to converge without it. |
The author said he was going to write a paper about this method, so he take down the video first. I believe we will be able to see his research results better soon. |
got an error when trying to run this, |
Same issue with this above. You have to download CLIP from hugginface manually and put that under |
the video was in mandarin |
A weird thing happened. I trained an embedding using this method on 6 images (actually cropped and flipped out of one single image) for 10k steps with the negative prompt. But when I tested, only using the positive prompt gives much better results than using the negative prompt as well. |
If you got a copy of it, could you share it? From a glimpse of the code, it seemed to work by using ConvNeXt as discriminator. Does it use pretrained model as backbone or initialize all weights as random? And how does it modify ConvNeXt to serve as discriminator? |
@stellaHSR you need to create a new embedding file with the -uc in the end for the negative training, in your case aptllz-uc |
@Lime-Cakes Yes there is a pretrained model at 7eu7d7/pixiv_AI_crawler, you can download it in the release. In 7eu7d7's modified version of WEBUI which adds APT, APT-stable-diffusion-auto-prompt, you can find a README which says:
|
Thx, it works! |
The new version of the method has some changes, and recently will consider making it as an extension. |
this is would love a version that is an extension that way its easier to manage for the end user. |
Please do |
I tried to train at 384*384 but got TypeError: only integer tensors of a single element can be converted to an index |
Please update us here when the plugin for this is available. thank you |
The new version of the method is renamed to DreamArtist. The paper will be published on arxiv soon. |
Extension version is available. |
Someone should add it to the list: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Extensions |
When I try to train with the option 'Read parameters (prompt, etc...) from txt2img tab when making previews' turned on, I got the following error message:
This is happening with both the extension version and the regular version. Also, when I tried to use reconstruction, I got
|
Has anyone gotten the extension to actually work yet? everyone I asked on the SD discord said the results looked nothing like the training image. |
Add elem ids for interrogate tabs
Propose a better method of prompt tuning (train embedding), which can super dramatically improve the image quality compared to the current embedding training method. Excellent performance can be active even with just one image for training (one-shot learning).
Performance Comparison with same extra prompt:
train image:
current method:
my APT method:
or add some details prompt:
no prompt tuning (textual inversion):
Learn Genshin Nahida from single image:
combination Nahida with additional prompt:
**Note. **
The results from this version are currently inconsistent with the version I used before, and there are some discrepancies in performance.
The reason for this is currently unknown.Probably due to the scheduler, the training 1000 steps learning rate decreases too fast.my old version :
new version :
It may be better to use https://github.com/7eu7d7/APT-stable-diffusion-auto-prompt when training embedding