CLIP_VizWiz_Question_Answering

Implemented a CLIP-based approach to Visual Question Answering (VQA) using the VizWiz dataset. Loaded and split the data using stratified sampling on answer type and answerability, selected the most common answer for each question using Levenshtein distance to break ties, and encoded image-question pairs using a CLIP ViT-L/14@336px model with data augmentation. Trained a VQA model using auxiliary answer type loss and an answerability model, and evaluated the approach using accuracy and answerability metrics. Achieved an accuracy of 42.0% and an answerability of 82.8%, indicating effectiveness in answering open-ended questions based on images.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CLIP_Based_VizWiz_Question_Answering.pdf		CLIP_Based_VizWiz_Question_Answering.pdf
README.md		README.md
clip-vqa-7113-6856-6876.ipynb		clip-vqa-7113-6856-6876.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLIP_VizWiz_Question_Answering

About

Releases

Packages

Languages

AhmedDusuki/CLIP_VizWiz_Question_Answering

Folders and files

Latest commit

History

Repository files navigation

CLIP_VizWiz_Question_Answering

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages