Skip to content

xizaoqu/TrajectoryAttntion

Repository files navigation


Trajectory Attention For Fine-grained Video Motion Control

Zeqi Xiao1 Wenqi Ouyang1 Yifan Zhou1 Shuai Yang2 Lei Yang3 Jianlou Si3 Xingang Pan1
1S-Lab, Nanyang Technological University,
2Wangxuan Institute of Computer Technology, Peking University,
3Sensetime Research

demo.mp4

🏠 About

Dialogue_Teaser
Trajectory attention injects partial motion information by making content along trajectories consistent. It facilitates various tasks such as camera motion control on images and videos, and first-frame-guided video editing. Yellow boxes indicate reference contents. Green boxes indicate input frames. Blue boxes indicate output frames.
Dialogue_Teaser
Our method allows for conditioning on trajectories from various sources -- such as camera motion derived from a single image, as shown in this figure. We inject these conditions into the model through trajectory attention, enabling explicit and fine-grained control over the motion in the generated video.

Installation

  1. Create a Conda Environment

This codebase is tested with the versions of PyTorch 1.13.1+cu117.

conda create -n trajattn python==3.10
conda activate trajattn
pip install -r requirements.txt
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
  1. Download model weights Download model weights from huggingface.

  2. Clone Relevant Repositories and Download Checkpoints

# Clone the Depth-Anything-V2 repository
git clone https://github.com/DepthAnything/Depth-Anything-V2
# Download the Depth-Anything-V2-Large checkpoint
wget https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true

Save the checkpoints to the checkpoints/ directory. You can also modify the checkpoint path in the running scripts if needed.

Runnig

To control camera motion on images, execute the following script

sh image_control.sh

TODO

  • Release models and weight;
  • Release pipelines for single image camera motion control;
  • Release pipelines for video camera motion control;
  • Release pipelines for video editing;
  • Release training pipeline

🔗 Citation

If you find our work helpful, please cite:

@misc{xiao2024trajectoryattentionfinegrainedvideo,
      title={Trajectory Attention for Fine-grained Video Motion Control}, 
      author={Zeqi Xiao and Wenqi Ouyang and Yifan Zhou and Shuai Yang and Lei Yang and Jianlou Si and Xingang Pan},
      year={2024},
      eprint={2411.19324},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.19324}, 
}

👏 Acknowledgements

  • SVD: Our model is tuned from SVD.
  • MiraData: We use the data collected by MiraData.
  • Depth-Anything-V2: We estimate depth map by Depth-Anything-V2.
  • Unimatch: We estimate optical flow map by Unimatch.
  • Cotracker: We estimate point trajectories by Cotracker.
  • NVS_Solver: Our camera rendering code is based on NVS_Solver.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published