You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Your work is very impressive and interests me very much. However, I would like to ask about the description of gram in the paper. Could you please tell me why clip is used to get the vector and calculate gram matrix instead of vgg.
The text was updated successfully, but these errors were encountered:
I believe CLIP may be more robust to images with artifacts generated by diffusion models compared to VGG, which is pre-trained on natural images. CLIP is trained on a larger dataset and leverages its text embedding space, rather than relying solely on category labels. However, this hypothesis deserves further experimentation.
I believe CLIP may be more robust to images with artifacts generated by diffusion models compared to VGG, which is pre-trained on natural images. CLIP is trained on a larger dataset and leverages its text embedding space, rather than relying solely on category labels. However, this hypothesis deserves further experimentation.
Your work is very impressive and interests me very much. However, I would like to ask about the description of gram in the paper. Could you please tell me why clip is used to get the vector and calculate gram matrix instead of vgg.
The text was updated successfully, but these errors were encountered: