Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

propose an Contrastive Prompt Tuning method (DreamArtist), can super dramatically improve the image quality and diversity #2945

Closed
wants to merge 7 commits into from

Conversation

IrisRainbowNeko
Copy link

@IrisRainbowNeko IrisRainbowNeko commented Oct 17, 2022

Propose a better method of prompt tuning (train embedding), which can super dramatically improve the image quality compared to the current embedding training method. Excellent performance can be active even with just one image for training (one-shot learning).

Performance Comparison with same extra prompt:

train image:
p1

current method:
image
grid-0023

my APT method:
image
grid-0024

or add some details prompt:
image

no prompt tuning (textual inversion):
image

Learn Genshin Nahida from single image:
grid-0561

combination Nahida with additional prompt:
grid-magic-nxd
grid-0556

**Note. **
The results from this version are currently inconsistent with the version I used before, and there are some discrepancies in performance. The reason for this is currently unknown. Probably due to the scheduler, the training 1000 steps learning rate decreases too fast.

my old version :
image

new version :
grid-0024

It may be better to use https://github.com/7eu7d7/APT-stable-diffusion-auto-prompt when training embedding

@IrisRainbowNeko IrisRainbowNeko changed the title propose a more advanced Prompt Tuning method (APT) propose a more advanced Prompt Tuning method (APT), can super dramatically improve the image quality Oct 17, 2022
@TingTingin
Copy link

is this for textual inversion?

@IrisRainbowNeko
Copy link
Author

is this for textual inversion?

yes, the textual inversion is actually the prompt tuning in NLP.
I have modified the training method of textual inversion to propose a method that can dramatically improve image quality, and the details of the principle are explained in the video https://www.bilibili.com/video/BV1xD4y1C73c/.

@IrisRainbowNeko
Copy link
Author

Add prompt embedding learning of negative words to dramatically improve the quality of generated images. The concept of high quality can be learned from a single image.

Add reconstruction loss to improve the detail quality and richness of the generated images.

Adding a discriminator trained by artificial annotation (implemented using convnext) allows embedding to learn based on the model.

@TingTingin
Copy link

thanks for explanation

@C43H66N12O12S2
Copy link
Collaborator

Thanks for the PR! Looks interesting. Could you explain in more detail?

Add reconstruction loss to improve the detail quality and richness of the generated images.

Is this specific to each embedding/tuning or a general improvement?

@ClashSAN
Copy link
Collaborator

Do you have source link for meta files? Is it from this?
https://github.com/facebookresearch/ConvNeXt/blob/main/main.py

@IrisRainbowNeko
Copy link
Author

Thanks for the PR! Looks interesting. Could you explain in more detail?

Add reconstruction loss to improve the detail quality and richness of the generated images.

Is this specific to each embedding/tuning or a general improvement?

it work for all prompt, once add {name} to prompt and {name}-uc to negative it will improve quality.
some image I genearated use this.
#pixiv https://www.pixiv.net/artworks/102011948
https://www.pixiv.net/artworks/102011852
https://www.pixiv.net/artworks/102011612

@IrisRainbowNeko
Copy link
Author

Do you have source link for meta files? Is it from this? https://github.com/facebookresearch/ConvNeXt/blob/main/main.py

yes, I copy convnext codes from that, and add an interface

@C43H66N12O12S2
Copy link
Collaborator

Great work!

The final word is Automatic's but as far as I see, there's some changes needed for merge:
1 - Please clone the original conv_next to repositories or install it as a module in launch.py
2 - Keep only your changes in modules/ directory, preferably as a single file.
3 - Move requirements to requirements_versions.txt in the repo root. Don't touch torch as that's installed in launch.py

@IrisRainbowNeko
Copy link
Author

Thanks for the PR! Looks interesting. Could you explain in more detail?

Add reconstruction loss to improve the detail quality and richness of the generated images.

Is this specific to each embedding/tuning or a general improvement?

The reconstruction loss is actually decoding the latent features to get the image and calculate the pixel-level loss between images. This reconstruction loss is a popular approach in some deep learning algorithms.

@IrisRainbowNeko
Copy link
Author

Great work!

The final word is Automatic's but as far as I see, there's some changes needed for merge: 1 - Please clone the original conv_next to repositories or install it as a module in launch.py 2 - Keep only your changes in modules/ directory, preferably as a single file. 3 - Move requirements to requirements_versions.txt in the repo root. Don't touch torch as that's installed in launch.py

Thanks for your suggestion, I have make the conv_next as repositories in launch.py and put the requirements into requirements.txt

@flesnuk
Copy link

flesnuk commented Oct 17, 2022

Having the latest commit, after executing a second time I get:

Commit hash: 04d355a017280054ff88cfa095fc3d0c54998bde
Traceback (most recent call last):
  File "D:\ai\stable-diffusion-webui\launch.py", line 186, in <module>
    prepare_enviroment()
  File "D:\ai\stable-diffusion-webui\launch.py", line 165, in prepare_enviroment
    os.rename(os.path.join(repo_dir('conv_next'), 'models'), os.path.join(repo_dir('conv_next'), 'models_cnx'))
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'repositories\\conv_next\\models' -> 'repositories\\conv_next\\models_cnx'

I only have the models_cnx folder, that's why is failing. Add a check to avoid fail.

I tried running it but I don't have enough VRAM. With 8GB seems impossible to run even using --medvram and convnext_tiny_1k_224_ema.pth for classifier model.
Do you know how much VRAM is needed?

@IrisRainbowNeko
Copy link
Author

Having the latest commit, after executing a second time I get:

Commit hash: 04d355a017280054ff88cfa095fc3d0c54998bde
Traceback (most recent call last):
  File "D:\ai\stable-diffusion-webui\launch.py", line 186, in <module>
    prepare_enviroment()
  File "D:\ai\stable-diffusion-webui\launch.py", line 165, in prepare_enviroment
    os.rename(os.path.join(repo_dir('conv_next'), 'models'), os.path.join(repo_dir('conv_next'), 'models_cnx'))
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'repositories\\conv_next\\models' -> 'repositories\\conv_next\\models_cnx'

I only have the models_cnx folder, that's why is failing. Add a check to avoid fail.

I tried running it but I don't have enough VRAM. With 8GB seems impossible to run even using --medvram and convnext_tiny_1k_224_ema.pth for classifier model. Do you know how much VRAM is needed?

rename bug is fixed.
convnext is quite a big model, like transformer. I train it use 3090 with 24GB VRAM. You can fine-tuning my per-trained model for fast training https://github.com/7eu7d7/pixiv_AI_crawler/releases/download/v2/checkpoint-best_t5.pth

@Evil-Dragon
Copy link

I've been testing it for a little while on the latest pull and it does seem pretty promising so far. So far up to about 3000+ steps with 1 image on a RTX 3080 12GB and it does seem to be working well from what I can tell. I'll do some more testing and report my findings.

@IrisRainbowNeko
Copy link
Author

promising

Actrually, I get above image with just 1000 steps training. This tag with commonly used negative words can generate amazing images. Mix embedding is also work well.

@C43H66N12O12S2
Copy link
Collaborator

C43H66N12O12S2 commented Oct 17, 2022

@flesnuk The speed can probably be increased by fusing ConvNext with https://github.com/NVIDIA/TransformerEngine (if and when it's supported in Windows) and TorchDynamo.

I'll take a look if Automatic merges.

@Evil-Dragon
Copy link

Evil-Dragon commented Oct 17, 2022

promising

Actrually, I get above image with just 1000 steps training. This tag with commonly used negative words can generate amazing images. Mix embedding is also work well.

I could have probably been overtraining a bit. Also struggling to train with a 12GB card with even 3 images, keep running out of memory. It's pretty borderline for this type of training, sometimes it will go for a few hundred steps and then OOM.

EDIT: But when it works, it works really well. I've never had much luck with training embeddings but this one seems to actually work for me and capture style/details really well.


import os.path
import numpy as np
import cv2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cv2 is not used.

@@ -23,3 +23,7 @@ resize-right
torchdiffeq
kornia
lark
scikit_learn
requests
opencv-python

This comment was marked as resolved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm like, 95% sure opencv is used somewhere else in the project...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked, eh, yes. For some reason it's not a requirement.

@d8ahazard
Copy link
Collaborator

d8ahazard commented Oct 17, 2022 via email

@d8ahazard
Copy link
Collaborator

@7eu7d7 -

So, I'm running on an 8GB GPU, and with this version of TI using the --medvram flag, if I enable negative prompting, I get an OOM on the first training step.

With the current master branch and --medvram, this does not happen.

I guess, not a dealbreaker, but maybe something to consider for those of us who still need to upgrade their GPU. :D

@C43H66N12O12S2
Copy link
Collaborator

@d8ahazard Do you mean that you OOM when using the regular Textual Inversion?

@d8ahazard
Copy link
Collaborator

d8ahazard commented Oct 17, 2022 via email

@MarkovInequality
Copy link

@d8ahazard I have this problem when training negative prompts as well on 10GB of VRAM. Turning off "train with reconstruction" stops the OOM errors, but I can't get training to converge without it.

@Arcsion
Copy link

Arcsion commented Oct 24, 2022

The video seems to be removed or inaccessible. Is there any other explanation of the idea? It sounds promising.

The author said he was going to write a paper about this method, so he take down the video first. I believe we will be able to see his research results better soon.

@mekrod
Copy link

mekrod commented Oct 25, 2022

got an error when trying to run this, APT-stable-diffusion-auto-prompt-master\venv\lib\site-packages\transformers\tokenization_utils_base.py", line 1768, in from_pretrained raise EnvironmentError( OSError: Can't load tokenizer for './models/clip-vit-large-patch14'.

@Arcsion
Copy link

Arcsion commented Oct 25, 2022

got an error when trying to run this, APT-stable-diffusion-auto-prompt-master\venv\lib\site-packages\transformers\tokenization_utils_base.py", line 1768, in from_pretrained raise EnvironmentError( OSError: Can't load tokenizer for './models/clip-vit-large-patch14'.

Same issue with this above.

You have to download CLIP from hugginface manually and put that under models/clip-vit-large-patch14/, better clone that repository and put everything in.

@TingTingin
Copy link

The video seems to be removed or inaccessible. Is there any other explanation of the idea? It sounds promising.

the video was in mandarin

@zhupeter010903
Copy link

A weird thing happened. I trained an embedding using this method on 6 images (actually cropped and flipped out of one single image) for 10k steps with the negative prompt. But when I tested, only using the positive prompt gives much better results than using the negative prompt as well.

@Lime-Cakes
Copy link

The video seems to be removed or inaccessible. Is there any other explanation of the idea? It sounds promising.

the video was in mandarin

If you got a copy of it, could you share it? From a glimpse of the code, it seemed to work by using ConvNeXt as discriminator. Does it use pretrained model as backbone or initialize all weights as random? And how does it modify ConvNeXt to serve as discriminator?

@stellaHSR
Copy link

stellaHSR commented Nov 8, 2022

@7eu7d7 hi, when I start train apt embedding using [Prompt Tuning (ConvNext)], the error is :
image
image

how to solve this?

@mekrod
Copy link

mekrod commented Nov 8, 2022

how to solve this?

@stellaHSR you need to create a new embedding file with the -uc in the end for the negative training, in your case aptllz-uc

@Arcsion
Copy link

Arcsion commented Nov 8, 2022

If you got a copy of it, could you share it? From a glimpse of the code, it seemed to work by using ConvNeXt as discriminator. Does it use pretrained model as backbone or initialize all weights as random? And how does it modify ConvNeXt to serve as discriminator?

@Lime-Cakes Yes there is a pretrained model at 7eu7d7/pixiv_AI_crawler, you can download it in the release.

In 7eu7d7's modified version of WEBUI which adds APT, APT-stable-diffusion-auto-prompt, you can find a README which says:

Add model-based prompt tuning, based on the convnext model trained in my previous pixiv_AI_crawler, and use another AI to evaluate the quality of the generated images as a discriminator to aid in model training. It would allow prompt to learn the concept of high quality, or learn your fetish.

@stellaHSR
Copy link

how to solve this?

@stellaHSR you need to create a new embedding file with the -uc in the end for the negative training, in your case aptllz-uc

Thx, it works!
But it seems need a specified model ? I use the animefull-latest model, the loss is quite large (0.98) and the output image looks like noise figure

@IrisRainbowNeko
Copy link
Author

The new version of the method has some changes, and recently will consider making it as an extension.

@TWIISTED-STUDIOS
Copy link

The new version of the method has some changes, and recently will consider making it as an extension.

this is would love a version that is an extension that way its easier to manage for the end user.

@f8upd8
Copy link

f8upd8 commented Nov 11, 2022

The new version of the method has some changes, and recently will consider making it as an extension.

Please do

@Raz0rStorm
Copy link

I tried to train at 384*384 but got TypeError: only integer tensors of a single element can be converted to an index

@aliencaocao
Copy link
Contributor

Please update us here when the plugin for this is available. thank you

@IrisRainbowNeko
Copy link
Author

The new version of the method is renamed to DreamArtist.
Extension Version: https://github.com/7eu7d7/DreamArtist-sd-webui-extension
Regular version: https://github.com/7eu7d7/DreamArtist-stable-diffusion

The paper will be published on arxiv soon.

@IrisRainbowNeko
Copy link
Author

Please update us here when the plugin for this is available. thank you

Extension version is available.

@aliencaocao
Copy link
Contributor

Someone should add it to the list: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Extensions
cc @ClashSAN

@zhupeter010903
Copy link

zhupeter010903 commented Nov 12, 2022

When I try to train with the option 'Read parameters (prompt, etc...) from txt2img tab when making previews' turned on, I got the following error message:

Traceback (most recent call last):
  File "/content/gdrive/.shortcut-targets-by-id/1VNhtnoJC8AkKUr4-kCbT3hxMSzfiOGCd/sd/stable-diffusion-webui/modules/ui.py", line 185, in f
    res = list(func(*args, **kwargs))
  File "/content/gdrive/.shortcut-targets-by-id/1VNhtnoJC8AkKUr4-kCbT3hxMSzfiOGCd/sd/stable-diffusion-webui/webui.py", line 54, in f
    res = func(*args, **kwargs)
  File "/content/gdrive/.shortcut-targets-by-id/1VNhtnoJC8AkKUr4-kCbT3hxMSzfiOGCd/sd/stable-diffusion-webui/extensions/DreamArtist/scripts/dream_artist/ui.py", line 30, in train_embedding
    embedding, filename = dream_artist.cptuning.train_embedding(*args)
  File "/content/gdrive/.shortcut-targets-by-id/1VNhtnoJC8AkKUr4-kCbT3hxMSzfiOGCd/sd/stable-diffusion-webui/extensions/DreamArtist/scripts/dream_artist/cptuning.py", line 486, in train_embedding
    processed = processing.process_images(p)
  File "/content/gdrive/.shortcut-targets-by-id/1VNhtnoJC8AkKUr4-kCbT3hxMSzfiOGCd/sd/stable-diffusion-webui/modules/processing.py", line 423, in process_images
    res = process_images_inner(p)
  File "/content/gdrive/.shortcut-targets-by-id/1VNhtnoJC8AkKUr4-kCbT3hxMSzfiOGCd/sd/stable-diffusion-webui/modules/processing.py", line 442, in process_images_inner
    file.write(processed.infotext(p, 0))
  File "/content/gdrive/.shortcut-targets-by-id/1VNhtnoJC8AkKUr4-kCbT3hxMSzfiOGCd/sd/stable-diffusion-webui/modules/processing.py", line 281, in infotext
    return create_infotext(p, self.all_prompts, self.all_seeds, self.all_subseeds, comments=[], position_in_batch=index % self.batch_size, iteration=index // self.batch_size)
  File "/content/gdrive/.shortcut-targets-by-id/1VNhtnoJC8AkKUr4-kCbT3hxMSzfiOGCd/sd/stable-diffusion-webui/modules/processing.py", line 411, in create_infotext
    negative_prompt_text = "\nNegative prompt: " + p.negative_prompt if p.negative_prompt else ""
TypeError: can only concatenate str (not "tuple") to str

This is happening with both the extension version and the regular version. Also, when I tried to use reconstruction, I got

Traceback (most recent call last):
  File "/content/stable-diffusion-webui/modules/ui.py", line 185, in f
    res = list(func(*args, **kwargs))
  File "/content/stable-diffusion-webui/webui.py", line 54, in f
    res = func(*args, **kwargs)
  File "/content/stable-diffusion-webui/modules/dream_artist/ui.py", line 36, in train_embedding
    embedding, filename = modules.dream_artist.cptuning.train_embedding(*args)
  File "/content/stable-diffusion-webui/modules/dream_artist/cptuning.py", line 413, in train_embedding
    x_samples_ddim = shared.sd_model.decode_first_stage.__wrapped__(shared.sd_model, output[2])  # forward with grad
  File "/content/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 763, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "/content/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/autoencoder.py", line 331, in decode
    z = self.post_quant_conv(z)
  File "/usr/local/envs/automatic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/automatic/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 457, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/envs/automatic/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same

@IrisRainbowNeko IrisRainbowNeko changed the title propose an advanced Prompt Tuning method (APT), can super dramatically improve the image quality and diversity propose an Contrastive Prompt Tuning method (DreamArtist), can super dramatically improve the image quality and diversity Nov 13, 2022
@devNegative-asm
Copy link
Contributor

Has anyone gotten the extension to actually work yet? everyone I asked on the SD discord said the results looked nothing like the training image.

DrakeRichards pushed a commit to DrakeRichards/stable-diffusion-webui that referenced this pull request Mar 22, 2024
Atry pushed a commit to Atry/stable-diffusion-webui that referenced this pull request Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.