Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clip resources to the transformers documentation #20190

Merged
merged 4 commits into from
Nov 15, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions docs/source/en/model_doc/clip.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,25 @@ encode the text and prepare the images. The following example shows how to get t

This model was contributed by [valhalla](https://huggingface.co/valhalla). The original code can be found [here](https://github.com/openai/CLIP).

## Resources

A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CLIP. If you're
interested in submitting a resource to be included here, please feel free to open a Pull Request and we will review it.
The resource should ideally demonstrate something new instead of duplicating an existing resource.

<PipelineTag pipeline="text-to-image"/>
- A blog post on [How to use CLIP to retrieve images from text](https://huggingface.co/blog/fine-tune-clip-rsicd).
- A blog bost on [How to use CLIP for Japanese text to image generation](https://huggingface.co/blog/japanese-stable-diffusion).
Comment on lines +85 to +86
Copy link
Contributor

@NielsRogge NielsRogge Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first sentence should be something along the lines of "How to fine-tune CLIP on image-text pairs" rather than "How to retrieve...", the second one is not relevant for CLIP, as it's a blog about Stable Diffusion.



<PipelineTag pipeline="image-to-text"/>
- A notebook showing [Video to text matching with CLIP for videos](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/X-CLIP/Video_text_matching_with_X_CLIP.ipynb).


<PipelineTag pipeline="zero-shot-classification"/>
- A notebook showing [Zero shot video classification using CLIP for video](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/X-CLIP/Zero_shot_classify_a_YouTube_video_with_X_CLIP.ipynb).
Comment on lines +89 to +94
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are resources for X-CLIP, not CLIP.



## CLIPConfig

[[autodoc]] CLIPConfig
Expand Down