You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great contribution to the Community.
I found that the experiment that uses CLIP as text encoder has been conducted in the paper, but I didn't find the corresponding code. Will you release the CLIP version code? I wonder how to deal with the linear layer of the attention layer in CLIP text encoder. Because it seems that the linear layer of the attention layer in CLIP is NonDynamicallyQuantizableLinear, not normal nn.Linear.
The text was updated successfully, but these errors were encountered:
Thank you for your interest in our LaVi-Bridge! We will schedule the release of the code related to CLIP text encoder. In the meantime, you can refer to the test/t5_unet.py. The main difference is to switch the text encoder from transformers.T5EncoderModel and AutoTokenizer to transformers.CLIPTextModel and CLIPTokenizer. The pre-trained model is the "CompVis/stable-diffusion-v1-4" repository on Hugging Face. Additionally, you can refer to the standard Stable Diffusion 1.4 pipeline, which also utilizes CLIP as the language model.
Thanks for your great contribution to the Community.
I found that the experiment that uses CLIP as text encoder has been conducted in the paper, but I didn't find the corresponding code. Will you release the CLIP version code? I wonder how to deal with the linear layer of the attention layer in CLIP text encoder. Because it seems that the linear layer of the attention layer in CLIP is
NonDynamicallyQuantizableLinear,
not normalnn.Linear
.The text was updated successfully, but these errors were encountered: