SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing

Overview

SST-EM (Semantic, Spatial, and Temporal Evaluation Metric) is a cutting-edge evaluation framework for video editing models. It addresses the limitations of traditional metrics by leveraging Vision-Language Models, Object Detection, and Temporal Consistency checks to evaluate semantic fidelity and temporal smoothness. SST-EM integrates four components—semantic extraction, object tracking, object refinement, and temporal consistency—into a unified metric with weights optimized through human evaluations. This framework offers a comprehensive, multidimensional assessment of video editing quality.

SST-EM Architecture

Key Features

Multidimensional evaluation combining semantic and visual metrics.
Evaluated stae-of-the-art video editing models such as VideoP2P, TokenFlow, Control-A-Video, and FateZero.
Customizable for various video styles, genres, and complexities.

Results

Here’s a comparison of results for various video editing models:

Usage

Step 1: Clone the repository

git clone https://github.com/VarunBiyyala/Custom_evaluation_Metrics.git
cd Custom_evaluation_Metrics

Step 2: Install dependencies. The detailed installations are in the demo notebook.

Step 3: Run the framework. Before running the framework, make sure your edited videos are uploaded in the below structure.

Data Structure

To run this evaluation pipeline, your edited videos has to be arranged in the following structure

├── ModelName/                 # Name of the video editing  model
  ├── video1/                  # Example datasets
  ├── video2/                  # Images for documentation
  ├── video3/                   # Helper scripts
  .
  .
  .

Note: Video names should be the editing prompt used during video editing.

Running Video-Editing Models

To evaluate any video-editing model with our framework, first you need to generate edited videos using editing prompts.
You can find example demos to run different vidoe-editing models used in our paper with the resolved dependancies.

Acknowledgments

We would like to thank the team behind Enhanced End-to-End Video Editing: Adaptive Customization of Path, Object, and Motion Dynamics for providing the dataset used in this project. Their contribution has been invaluable in enabling us to develop and test the SST-EM framework.

We also appreciate the open-source community for their contributions to research and development in video editing evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
data		data
src		src
Capstone_Custom_Metrics_Final.ipynb		Capstone_Custom_Metrics_Final.ipynb
Edited_Results_Comparison.png		Edited_Results_Comparison.png
Evaluation Pipeline.png		Evaluation Pipeline.png
README.md		README.md
metric_results (1).csv		metric_results (1).csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing

Overview

SST-EM Architecture

Key Features

Results

Usage

Data Structure

Running Video-Editing Models

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

custommetrics-sst/SST_CustomEvaluationMetrics

Folders and files

Latest commit

History

Repository files navigation

SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing

Overview

SST-EM Architecture

Key Features

Results

Usage

Data Structure

Running Video-Editing Models

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages