OSCaR aims to advance the field of object state captioning and state change representation by providing a benchmark dataset and evaluation framework. This project focuses on understanding and describing the state changes of objects within videos, a crucial aspect for numerous applications in computer vision and artificial intelligence.
Our dataset is comprised of 500 videos, meticulously selected from the Ego4D and EPIC-KITCHENS datasets. Each video is accompanied by four detailed captions, thoroughly vetted through a stringent human verification process. This ensures the dataset's quality and reliability, making it an ideal benchmark for evaluating state-of-the-art models in object state captioning.
To facilitate a comprehensive and accurate performance assessment, we recommend the use of several text generation metrics, including:
- BLEU
- Rouge
- LSA
These metrics are essential for evaluating the quality of the generated captions in terms of their precision, recall, and semantic coherence.
We are currently in the final stages of preparing the OSCaR code and dataset for release. Our team is committed to ensuring the highest standards of quality and usability. We believe in the importance of making our research accessible and reproducible, allowing the community to explore and expand upon our work.
We appreciate your interest and patience. The release of the code and dataset will be announced shortly.
For additional information and updates, please refer to our paper available here.
If our work aids your research, please consider citing it as follows:
@inproceedings{nguyen2024oscar,
title={OSCaR: Object State Captioning and State Change Representation},
author={Nguyen, Nguyen and Bi, Jing and Vosoughi, Ali and Tian, Yapeng and Fazli, Pooyan and Xu, Chenliang},
booktitle={North American Chapter of the Association for Computational Linguistics (NAACL)},
year={2024}
}