SST-EM (Semantic, Spatial, and Temporal Evaluation Metric) is a cutting-edge evaluation framework for video editing models. It addresses the limitations of traditional metrics by leveraging Vision-Language Models, Object Detection, and Temporal Consistency checks to evaluate semantic fidelity and temporal smoothness. SST-EM integrates four components—semantic extraction, object tracking, object refinement, and temporal consistency—into a unified metric with weights optimized through human evaluations. This framework offers a comprehensive, multidimensional assessment of video editing quality.
- Multidimensional evaluation combining semantic and visual metrics.
- Evaluated stae-of-the-art video editing models such as VideoP2P, TokenFlow, Control-A-Video, and FateZero.
- Customizable for various video styles, genres, and complexities.
Here’s a comparison of results for various video editing models:
Step 1: Clone the repository
git clone https://github.com/VarunBiyyala/Custom_evaluation_Metrics.git
cd Custom_evaluation_Metrics
Step 2: Install dependencies. The detailed installations are in the demo notebook.
Step 3: Run the framework. Before running the framework, make sure your edited videos are uploaded in the below structure.
To run this evaluation pipeline, your edited videos has to be arranged in the following structure
├── ModelName/ # Name of the video editing model
├── video1/ # Example datasets
├── video2/ # Images for documentation
├── video3/ # Helper scripts
.
.
.
Note: Video names should be the editing prompt used during video editing.
- To evaluate any video-editing model with our framework, first you need to generate edited videos using editing prompts.
- You can find example demos to run different vidoe-editing models used in our paper with the resolved dependancies.
We would like to thank the team behind Enhanced End-to-End Video Editing: Adaptive Customization of Path, Object, and Motion Dynamics for providing the dataset used in this project. Their contribution has been invaluable in enabling us to develop and test the SST-EM framework.
We also appreciate the open-source community for their contributions to research and development in video editing evaluation.