LiNO3Dy / Awesome-Video-Diffusion Public

forked from showlab/Awesome-Video-Diffusion

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.

0 stars 205 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
README.md		README.md

Repository files navigation

Awesome Video Diffusion

A curated list of recent diffusion models for video generation, editing, restoration, understanding, nerf, etc.

(Source: Make-A-Video, Tune-A-Video, and Fate/Zero.)

Table of Contents

Open-source Toolboxes and Foundation Models
Evaluation Benchmarks and Metrics
Video Generation
Video Editing
Long-form Video Generation and Completion
Human or Subject Motion
Video Enhancement and Restoration
3D / NeRF
Video Understanding
Healthcare and Biology

Open-source Toolboxes and Foundation Models

Evaluation Benchmarks and Metrics

T2VScore: Towards A Better Metric for Text-to-Video Generation (Jan., 2024)
VBench: Comprehensive Benchmark Suite for Video Generative Models (Nov., 2023)
FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation (Nov., 2023)
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models (Oct., 2023)

Video Generation

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text (Mar., 2024)
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance (Mar., 2024)
Intention-driven Ego-to-Exo Video Generation (Mar., 2024)
DragAnything: Motion Control for Anything using Entity Representation (Mar., 2024)
FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing (Mar., 2024)
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models (Mar., 2024)
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis (Feb., 2024)
One-Shot Motion Customization of Text-to-Video Diffusion Models (Feb., 2024)
Magic-Me: Identity-Specific Video Customized Diffusion (Feb., 2024)
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation (Feb., 2024)
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion (Feb., 2024)
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization (Feb., 2024)
Boximator: Generating Rich and Controllable Motions for Video Synthesis (Feb., 2024)
Lumiere: A Space-Time Diffusion Model for Video Generation (Jan., 2024)
ActAnywhere: Subject-Aware Video Background Generation (Jan., 2024)
Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation (Jan., 2024)
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens (Jan., 2024)
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects (Jan., 2024)
UniVG: Towards UNIfied-modal Video Generation (Jan., 2024)
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models (Jan., 2024)
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model (Jan., 2024)
RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks (Jan., 2024)
Latte: Latent Diffusion Transformer for Video Generation (Jan., 2024)
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation (Jan., 2024)
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions (Jan., 2024)
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM (Jan., 2024)
TrailBlazer: Trajectory Control for Diffusion-Based Video Generation (Jan., 2024)
FlashVideo: A Framework for Swift Inference in Text-to-Video Generation (Dec., 2023)
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models (Dec., 2023)
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos (Dec., 2023)
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models (Dec., 2023)
VideoPoet: A Large Language Model for Zero-Shot Video Generation (Dec., 2023)
InstructVideo: Instructing Video Diffusion Models with Human Feedback (Dec., 2023)
VideoLCM: Video Latent Consistency Model (Dec., 2023)
PEEKABOO: Interactive Video Generation via Masked-Diffusion (Dec., 2023)
FreeInit: Bridging Initialization Gap in Video Diffusion Models (Dec., 2023)
Photorealistic Video Generation with Diffusion Models (Dec., 2023)
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution (Dec., 2023)
DreaMoving: A Human Video Generation Framework based on Diffusion Models (Dec., 2023)
MotionCrafter: One-Shot Motion Customization of Diffusion Models (Dec., 2023)
Customizing Motion in Text-to-Video Diffusion Models (Dec., 2023)
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators (Dec., 2023)
AVID: Any-Length Video Inpainting with Diffusion Model (Dec., 2023)
MTVG : Multi-text Video Generation with Text-to-Video Models (Dec., 2023)
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion (Dec., 2023)
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation (Dec., 2023)
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation (CVPR 2024)
GenDeF: Learning Generative Deformation Field for Video Generation (Dec., 2023)
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation (Dec., 2023)
F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis (Dec., 2023)
DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance (Dec., 2023)
LivePhoto: Real Image Animation with Text-guided Motion Control (Dec., 2023)
Fine-grained Controllable Video Generation via Object Appearance and Context (Dec., 2023)
VideoBooth: Diffusion-based Video Generation with Image Prompts (Dec., 2023)
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter (Dec., 2023)
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation (Nov., 2023)
ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models (Nov., 2023)
Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning (Nov., 2023)
VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model (Nov., 2023)
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation (Nov., 2023)
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models (Nov., 2023)
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation (Nov., 2023)
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model (Nov., 2023)
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax (Nov., 2023)
Sketch Video Synthesis (Nov., 2023)
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets (Nov., 2023)
Decouple Content and Motion for Conditional Image-to-Video Generation (Nov., 2023)
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline (Nov., 2023)
Fine-Grained Open Domain Image Animation with Motion Guidance (Nov., 2023)
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning (Nov., 2023)
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer (Nov., 2023)
MoVideo: Motion-Aware Video Generation with Diffusion Models (Nov., 2023)
Make Pixels Dance: High-Dynamic Video Generation (Nov., 2023)
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning (Nov., 2023)
Optimal Noise pursuit for Augmenting Text-to-Video Generation (Nov., 2023)
VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning (Nov., 2023)
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction (Oct., 2023)
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling (Oct., 2023)
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors (Oct., 2023)
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation (Oct., 2023)
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation (Sep., 2023)
MotionDirector: Motion Customization of Text-to-Video Diffusion Models (Sep., 2023)
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models (Sep., 2023)
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation (Sep., 2023)
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator (Sep., 2023)
Hierarchical Masked 3D Diffusion Model for Video Outpainting (Sep., 2023)
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation (Sep., 2023)
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation (Sep., 2023)
MagicAvatar: Multimodal Avatar Generation and Animation (Aug., 2023)
Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models (Aug., 2023)
SimDA: Simple Diffusion Adapter for Efficient Video Generation (Aug., 2023)
ModelScope Text-to-Video Technical Report (Aug., 2023)
Dual-Stream Diffusion Net for Text-to-Video Generation (Aug., 2023)
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory (Aug., 2023)
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation (Jul., 2023)
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation (Jul., 2023)
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning (Jul., 2023)
DisCo: Disentangled Control for Referring Human Dance Generation in Real World (Jul., 2023)
Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation (Jun., 2023)
VideoComposer: Compositional Video Synthesis with Motion Controllability (Jun., 2023)
Probabilistic Adaptation of Text-to-Video Models (Jun., 2023)
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance (Jun., 2023)
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising (May, 2023)
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models (May, 2023)
ControlVideo: Training-free Controllable Text-to-Video Generation (May, 2023)
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity (May, 2023)
Any-to-Any Generation via Composable Diffusion (May, 2023)
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation (May, 2023)
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models (May, 2023)
Motion-Conditioned Diffusion Model for Controllable Video Synthesis (Apr., 2023)
LaMD: Latent Motion Diffusion for Video Generation (Apr., 2023)
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models (CVPR 2023)
Text2Performer: Text-Driven Human Video Generation (Apr., 2023)
Generative Disco: Text-to-Video Generation for Music Visualization (Apr., 2023)
Latent-Shift: Latent Diffusion with Temporal Shift (Apr., 2023)
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion (Apr., 2023)
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos (Apr., 2023)
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos (CVPR 2023)
Seer: Language Instructed Video Prediction with Latent Diffusion Models (Mar., 2023)
Text2video-Zero: Text-to-Image Diffusion Models Are Zero-Shot Video Generators (Mar., 2023)
Conditional Image-to-Video Generation with Latent Flow Diffusion Models (CVPR 2023)
Decomposed Diffusion Models for High-Quality Video Generation (CVPR 2023)
Video Probabilistic Diffusion Models in Projected Latent Space (CVPR 2023)
Learning 3D Photography Videos via Self-supervised Diffusion on Single Images (Feb., 2023)
Structure and Content-Guided Video Synthesis With Diffusion Models (Feb., 2023)
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation (ICCV 2023)
Mm-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation (CVPR 2023)
Magvit: Masked Generative Video Transformer (Dec., 2022)
VIDM: Video Implicit Diffusion Models (AAAI 2023)
Efficient Video Prediction via Sparsely Conditioned Flow Matching (Nov., 2022)
Latent Video Diffusion Models for High-Fidelity Video Generation With Arbitrary Lengths (Nov., 2022)
SinFusion: Training Diffusion Models on a Single Image or Video (Nov., 2022)
MagicVideo: Efficient Video Generation With Latent Diffusion Models (Nov., 2022)
Imagen Video: High Definition Video Generation With Diffusion Models (Oct., 2022)
Make-A-Video: Text-to-Video Generation without Text-Video Data (ICLR 2023)
Diffusion Models for Video Prediction and Infilling (TMLR 2022)
McVd: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (NeurIPS 2022)
Video Diffusion Models (Apr., 2022)
Diffusion Probabilistic Modeling for Video Generation (Mar., 2022)

Video Editing

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing (Mar., 2024)
Spectral Motion Alignment for Video Motion Transfer using Diffusion Models (Mar., 2024)
DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing (Mar., 2024)
Video Editing via Factorized Diffusion Distillation (Mar., 2024)
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing (Feb., 2024)
Object-Centric Diffusion for Efficient Video Editing (Jan., 2024)
VASE: Object-Centric Shape and Appearance Manipulation of Real Videos (Jan., 2024)
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis (Dec., 2023)
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis (Dec., 2023)
RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing (Dec., 2023)
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers (Dec., 2023)
VidToMe: Video Token Merging for Zero-Shot Video Editing (Dec., 2023)
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing (Dec., 2023)
Neutral Editing Framework for Diffusion-based Video Editing (Dec., 2023)
DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing (Dec., 2023)
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models (Dec., 2023)
SAVE: Protagonist Diversification with Structure Agnostic Video Editing (Dec., 2023)
MagicStick: Controllable Video Editing via Control Handle Transformations (Dec., 2023)
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence (CVPR 2024)
DragVideo: Interactive Drag-style Video Editing (Dec., 2023)
Drag-A-Video: Non-rigid Video Editing with Point-based Interaction (Dec., 2023)
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models (Dec., 2023)
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models (CVPR 2024)
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing (ICLR 2024)
MotionEditor: Editing Video Motion via Content-Aware Diffusion (Nov., 2023)
Motion-Conditioned Image Animation for Video Editing (Nov., 2023)
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer (CVPR 2024)
Cut-and-Paste: Subject-Driven Video Editing with Attention Control (Nov., 2023)
LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation (Nov., 2023)
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models (Oct., 2023)
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing (Oct., 2023)
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models (ICLR 2024)
CCEdit: Creative and Controllable Video Editing via Diffusion Models (Sep., 2023)
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation (Sep., 2023)
MagicEdit: High-Fidelity and Temporally Coherent Video Editing (Aug., 2023)
StableVideo: Text-driven Consistency-aware Diffusion Video Editing (ICCV 2023)
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing (CVPR 2024)
TokenFlow: Consistent Diffusion Features for Consistent Video Editing (ICLR 2024)
INVE: Interactive Neural Video Editing (Jul., 2023)
VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing (Jun., 2023)
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation (SIGGRAPH Asia 2023)
ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing (May, 2023)
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts (May, 2023)
Soundini: Sound-Guided Diffusion for Natural Video Editing (Apr., 2023)
Zero-Shot Video Editing Using Off-the-Shelf Image Diffusion Models (Mar., 2023)
Edit-A-Video: Single Video Editing with Object-Aware Consistency (Mar., 2023)
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing (Mar., 2023)
Pix2video: Video Editing Using Image Diffusion (Mar., 2023)
Video-P2P: Video Editing with Cross-attention Control (Mar., 2023)
Dreamix: Video Diffusion Models Are General Video Editors (Feb., 2023)
Shape-Aware Text-Driven Layered Video Editing (Jan., 2023)
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model (Jan., 2023)
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding (CVPR 2023)

Long-form Video Generation and Completion

MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (NeurIPS 2022)
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation (Mar., 2023)
Flexible Diffusion Modeling of Long Videos (May, 2022)

Human or Subject Motion

DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation (Jan., 2024)
Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model (CVPR 2023)
InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions (Apr., 2023)
ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model (Apr., 2023)
Human Motion Diffusion as a Generative Prior (Mar., 2023)
Can We Use Diffusion Probabilistic Models for 3d Motion Prediction? (Feb., 2023)
Single Motion Diffusion (Feb., 2023)
HumanMAC: Masked Motion Completion for Human Motion Prediction (Feb., 2023)
DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model (Jan., 2023)
Modiff: Action-Conditioned 3d Motion Generation With Denoising Diffusion Probabilistic Models (Jan., 2023)
Unifying Human Motion Synthesis and Style Transfer With Denoising Diffusion Probabilistic Models (GRAPP 2023)
Executing Your Commands via Motion Diffusion in Latent Space (CVPR 2023)
Pretrained Diffusion Models for Unified Human Motion Synthesis (Dec., 2022)
PhysDiff: Physics-Guided Human Motion Diffusion Model (Dec., 2022)
BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction (Dec., 2022)
Listen, Denoise, Action! Audio-Driven Motion Synthesis With Diffusion Models (Nov. 2022)
Diffusion Motion: Generate Text-Guided 3d Human Motion by Diffusion Model (ICASSP 2023)
Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction (Oct., 2022)
Human Motion Diffusion Model (ICLR 2023)
FLAME: Free-form Language-based Motion Synthesis & Editing (AAAI 2023)
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model (Aug., 2022)
Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion (CVPR 2022)

Video Enhancement and Restoration

LDMVFI: Video Frame Interpolation with Latent Diffusion Models (Mar., 2023)
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming (Nov., 2022)

3D / NeRF

Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields (May, 2023)
RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture (May, 2023)
NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models (CVPR 2023)
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction (Apr., 2023)
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions (Mar., 2023)
DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models (Feb., 2023)
NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion (Feb., 2023)
DiffRF: Rendering-guided 3D Radiance Field Diffusion (CVPR 2023)

Video Understanding

Exploring Diffusion Models for Unsupervised Video Anomaly Detection (Apr., 2023)
PDPP:Projected Diffusion for Procedure Planning in Instructional Videos (CVPR 2023)
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion (Mar., 2023)
Diffusion Action Segmentation (ICCV 2023)
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model (ICCV 2023)
Refined Semantic Enhancement Towards Frequency Diffusion for Video Captioning (Nov., 2022)
A Generalist Framework for Panoptic Segmentation of Images and Videos (Oct., 2022)

Healthcare and Biology

Annealed Score-Based Diffusion Model for Mr Motion Artifact Reduction (Jan., 2023)
Feature-Conditioned Cascaded Video Diffusion Models for Precise Echocardiogram Synthesis (Mar., 2023)
Neural Cell Video Synthesis via Optical-Flow Diffusion (Dec., 2022)

About

A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.

Report repository

Releases

No releases published

Packages

No packages published