Skip to content

Embed texts in Turkish to be used with OpenAI's CLIP

Notifications You must be signed in to change notification settings

monatis/turkish-clip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Acknowledgement

Google supported this work by providing Google Cloud credit. Thank you Google for supporting the open source! 🎉

What is this?

This work enables to use OpenAI CLIP's ViT-B/32 image encoder with a text encoder in Turkish. It is composed of a base model and a clip head model. The base model is a finetuned version of dbmdz/distilbert-base-turkish-cased and published at HuggingFace's Models Hub. It should be used with clip_head.h5 from this repo.

Installation

First, you need to install CLIP and its requirements according the prompts in its repo. Then, clone this repo and all other requirements can be installed by using requirements.txt:

git clone https://github.com/monatis/turkish-clip.git
cd turkish-clip
pip install -r requirements.txt

Usage

Once you clone the repo and install the requirements, you can run inference.py script for a quick inference demo:

python inference.py

This script loads the base model from HuggingFace's Models Hub and the clip head from this repo. It correctly classifies two sample images with a zero-shot technique.

How it works

encode_text() function agregates per-token hidden states outputted by the Distilbert model to produce a single vector per sequence. Then, clip_head.h5 model projects this vector onto the same vector space as CLIP's text encoder with a single dense layer. First, all the Distilbert layers were frozen an and the head dense layer was trained for a few epochs. Then, freezing was removed and the dense layer was trained with the Distilbert layers for a few more epochs. I created the dataset by machine-translating COCO captions into Turkish. During training, vector representations of English captions outputted by the original CLIP text encoder was used as target values, and MSE between these vectors and clip_head.h5 outputs were minimized.

Future work

The dataset and the training notebook will be released soon. I may also consider releasing bigger models finetuned with better datasets as well as more usage examples if the community finds this work useful. This model will also be added to my ai-aas project.

About

Embed texts in Turkish to be used with OpenAI's CLIP

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages