MovieCORE is a video question answering (VQA) dataset designed to probe deeper cognitive understanding of movie content.
For more details, please refer to our paper.
Please download the videos from MovieChat's HF repos. Training Data and Test Data. Extract them as fits your model and use our annotations.
Coming soon
The evaluation is performed across the following dimensions:
- Accuracy: Measures the semantic similarity between the predicted answer and the ground truth.
- Comprehensiveness: Assesses whether the predicted answer covers all key aspects mentioned in the ground truth.
- Depth: Evaluates the level of reasoning and insight demonstrated in the predicted answer.
- Evidence: Checks the quality and relevance of evidence provided in the predicted answer.
- Coherence: Measures the logical flow, organization, and clarity of the predicted answer.
To evaluate the MovieCore dataset, use the evaluate_moviecore.py
script. The script processes the dataset, evaluates each QA pair across the specified dimensions, and calculates overall and classification-specific scores.
export OPENAI_API_KEY='sk******'
python evaluate_moviecore.py --pred_path path/to/your/predictions.json
{
"video_1.mp4": [
{
"question": "How does the video depict the unique adaptations of the species in the Sahara Desert, and what roles do these species play in their ecosystem?",
"answer": "The GT answer.",
"pred": "Your pred.",
"classification": "the classification"
},
{
"question": "The second question of video 1?",
"answer": "The GT answer.",
"pred": "Your pred.",
"classification": "the classification"
}
],
"video_2.mp4": [
{
"question": "The only question of video 2",
"answer": "The GT answer.",
"pred": "Your pred.",
"classification": "the classification"
}
]
}
This dataset is provided under the MIT License.