Skip to content

Latest commit

 

History

History
302 lines (222 loc) · 13.6 KB

README.md

File metadata and controls

302 lines (222 loc) · 13.6 KB

DAIN (Depth-Aware Video Frame Interpolation)

Project | Paper

Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang

IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CVPR 2019

This work is developed based on our TPAMI work MEMC-Net, where we propose the adaptive warping layer. Please also consider referring to it.

Table of Contents

  1. Introduction
  2. Citation
  3. Requirements and Dependencies
  4. Installation
  5. Setup Video Processing (new)
  6. Testing Pre-trained Models
  7. Downloading Results
  8. Slow-motion Generation
  9. Training New Models

Introduction

We propose the Depth-Aware video frame INterpolation (DAIN) model to explicitly detect the occlusion by exploring the depth cue. We develop a depth-aware flow projection layer to synthesize intermediate flows that preferably sample closer objects than farther ones. Our method achieves state-of-the-art performance on the Middlebury dataset. We provide videos here.

Citation

If you find the code and datasets useful in your research, please cite:

@inproceedings{DAIN,
    author    = {Bao, Wenbo and Lai, Wei-Sheng and Ma, Chao and Zhang, Xiaoyun and Gao, Zhiyong and Yang, Ming-Hsuan}, 
    title     = {Depth-Aware Video Frame Interpolation}, 
    booktitle = {IEEE Conference on Computer Vision and Pattern Recognition},
    year      = {2019}
}
@article{MEMC-Net,
     title={MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement},
     author={Bao, Wenbo and Lai, Wei-Sheng, and Zhang, Xiaoyun and Gao, Zhiyong and Yang, Ming-Hsuan},
     journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
     doi={10.1109/TPAMI.2019.2941941},
     year={2018}
}

Requirements and Dependencies

Ubuntu (We test with Ubuntu = 16.04.5 LTS)

Python (We test with Python = 3.6.8 in Anaconda3 = 4.1.1)

$ mkdir ~/tmp
$ cd tmp
$ curl -O https://repo.continuum.io/archive/Anaconda3-4.1.1-Linux-x86_64.sh 
$ bash Anaconda3-4.1.1-Linux-x86_64.sh

Cuda & Cudnn for Anaconda (We test with Cuda = 9.0 and Cudnn = 7.0)

$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb 
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb 
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn7-dev_7.0.5.15-1+cuda9.0_amd64.deb 
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libnccl2_2.1.4-1+cuda9.0_amd64.deb 
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libnccl-dev_2.1.4-1+cuda9.0_amd64.deb
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
$ sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb 
$ sudo dpkg -i libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb 
$ sudo dpkg -i libcudnn7-dev_7.0.5.15-1+cuda9.0_amd64.deb 
$ sudo dpkg -i libnccl2_2.1.4-1+cuda9.0_amd64.deb 
$ sudo dpkg -i libnccl-dev_2.1.4-1+cuda9.0_amd64.deb 
$ sudo apt-get update
$ # might need to do this instead if keys weren't retrieved: sudo apt-get --allow-unauthenticated update
$ sudo apt-get install cuda=9.0.176-1
$ sudo apt-get install libcudnn7-dev
$ sudo apt-get install libnccl-dev
$ sudo reboot
$ sudo vi .bashrc

Add at bottom of ~/.bashrc

$ export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}} 
$ export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

PyTorch (The customized depth-aware flow projection and other layers require ATen API in PyTorch = 1.0.0)

$ conda install -c anaconda mkl
$ conda install pytorch==1.0.0 torchvision==0.2.1 cuda90 -c pytorch 

GCC (Compiling PyTorch 1.0.0 extension files (.c/.cu) requires gcc = 4.9.1 and nvcc = 9.0 compilers)

$ sudo vi /etc/apt/sources.list
$ # Add at bottom of sources.list:
$ deb http://cz.archive.ubuntu.com/ubuntu xenial main universe
$ deb http://cz.archive.ubuntu.com/ubuntu xenial main universe

$ sudo apt-get update
$ sudo apt-get install gcc-4.9 g++-4.9

NVIDIA GPU (We use Titan X (Pascal) with compute = 6.1, but we support compute_50/52/60/61 devices, should you have devices with higher compute capability, please revise this)

Installation

Download repository:

$ git clone https://github.com/e2m32/DAIN.git

Set up anaconda environment

$ conda env create -f ~/DAIN/environment.yaml
$ echo ". ~/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc
$ echo "conda activate" >> ~/.bashrc
$ cd ~/

Activate environment

$ conda activate dain_pytorch1.0.0

or

$ source activate dain_pytorch1.0.0

Before building Pytorch extensions, be sure you have pytorch 1.0.0:

$ python -c "import torch; print(torch.__version__)"

If not on pytorch 1.0.0, try again (ref):

$ conda install pytorch==1.0.0 torchvision==0.2.1 cuda90 -c pytorch 

Generate our PyTorch extensions:

$ cd ~/DAIN/my_package/
$ ./build.sh

Generate the Correlation package required by PWCNet:

$ cd ~/DAIN/PWCNet/correlation_package_pytorch1_0
$ ./build.sh

Setup Video Processing (new)

Download pretrained models (shouldn't need to do this, should have been included in git clone):

$ cd model_weights
$ wget http://vllab1.ucmerced.edu/~wenbobao/DAIN/best.pth

Run your video. Optionally you can put CUDA_VISIBLE_DEVICES=0 in front of the python call to force CUDA to use GPU 0. (ref) To see available GPUs, use this command nvidia-smi

$ cd ~/DAIN
$ # The following arguments are optional	
$ python dain.py <inputvideo> <outputvideo> <num passes>

Testing Pre-trained Models

Make model weights dir and Middlebury dataset dir:

$ cd ~/DAIN
$ mkdir model_weights
$ mkdir MiddleBurySet

Download pretrained models (shouldn't need to do this, should have been included in git clone):

$ cd model_weights
$ wget http://vllab1.ucmerced.edu/~wenbobao/DAIN/best.pth

and Middlebury dataset:

$ cd ../MiddleBurySet
$ wget http://vision.middlebury.edu/flow/data/comp/zip/other-color-allframes.zip
$ unzip other-color-allframes.zip
$ wget http://vision.middlebury.edu/flow/data/comp/zip/other-gt-interp.zip
$ unzip other-gt-interp.zip
$ cd ..

We are good to test by:

$ CUDA_VISIBLE_DEVICES=0 python demo_MiddleBury.py

The interpolated results are under MiddleBurySet/other-result-author/[random number]/, where the random number is used to distinguish different runnings. The demo only interpolates one image per example.

Running your own videos

Downloading Results

Our DAIN model achieves the state-of-the-art performance on the UCF101, Vimeo90K, and Middlebury (eval and other). Download our interpolated results with:

$ wget http://vllab1.ucmerced.edu/~wenbobao/DAIN/UCF101_DAIN.zip
$ wget http://vllab1.ucmerced.edu/~wenbobao/DAIN/Vimeo90K_interp_DAIN.zip
$ wget http://vllab1.ucmerced.edu/~wenbobao/DAIN/Middlebury_eval_DAIN.zip
$ wget http://vllab1.ucmerced.edu/~wenbobao/DAIN/Middlebury_other_DAIN.zip

Slow-motion Generation

Our model is fully capable of generating slow-motion effect with minor modification on the network architecture. Run the following code by specifying time_step = 0.25 to generate x4 slow-motion effect:

$ CUDA_VISIBLE_DEVICES=0 python demo_MiddleBury_slowmotion.py --netName DAIN_slowmotion --time_step 0.25

or set time_step to 0.125 or 0.1 as follows

$ CUDA_VISIBLE_DEVICES=0 python demo_MiddleBury_slowmotion.py --netName DAIN_slowmotion --time_step 0.125
$ CUDA_VISIBLE_DEVICES=0 python demo_MiddleBury_slowmotion.py --netName DAIN_slowmotion --time_step 0.1

to generate x8 and x10 slow-motion respectively. Or if you would like to have x100 slow-motion for a little fun.

$ CUDA_VISIBLE_DEVICES=0 python demo_MiddleBury_slowmotion.py --netName DAIN_slowmotion --time_step 0.01

You may also want to create gif animations by:

$ cd MiddleBurySet/other-result-author/[random number]/Beanbags
$ convert -delay 1 *.png -loop 0 Beanbags.gif //1*10ms delay 

Have fun and enjoy yourself!

Training New Models

Download the Vimeo90K triplet dataset for video frame interpolation task, also see here by Xue et al., IJCV19.

$ cd DAIN
$ mkdir /path/to/your/dataset & cd /path/to/your/dataset 
$ wget http://data.csail.mit.edu/tofu/dataset/vimeo_triplet.zip
$ unzip vimeo_triplet.zip
$ rm vimeo_triplet.zip

Download the pretrained MegaDepth and PWCNet models

$ cd MegaDepth/checkpoints/test_local
$ wget http://vllab1.ucmerced.edu/~wenbobao/DAIN/best_generalization_net_G.pth
$ cd ../../../PWCNet
$ wget http://vllab1.ucmerced.edu/~wenbobao/DAIN/pwc_net.pth.tar
$ cd  ..

Run the training script:

$ CUDA_VISIBLE_DEVICES=0 python train.py --datasetPath /path/to/your/dataset --batch_size 1 --save_which 1 --lr 0.0005 --rectify_lr 0.0005 --flow_lr_coe 0.01 --occ_lr_coe 0.0 --filter_lr_coe 1.0 --ctx_lr_coe 1.0 --alpha 0.0 1.0 --patience 4 --factor 0.2

The optimized models will be saved to the model_weights/[random number] directory, where [random number] is generated for different runs.

Replace the pre-trained model_weights/best.pth model with the newly trained model_weights/[random number]/best.pth model. Then test the new model by executing:

$ CUDA_VISIBLE_DEVICES=0 python demo_MiddleBury.py

Contact

Wenbo Bao; Wei-Sheng (Jason) Lai

License

See MIT License