Skip to content

Latest commit

 

History

History
228 lines (181 loc) · 8.9 KB

README.md

File metadata and controls

228 lines (181 loc) · 8.9 KB

JA-POLS

Authors: Irit Chelly, Vlad Winter, Dor Litvak, David Rosen, and Oren Freifeld.

This code repository corresponds to our CVPR '20 paper: JA-POLS: a Moving-camera Background Model via Joint Alignment and Partially-overlapping Local Subspaces. JA-POLS is a novel 2D-based method for unsupervised learning of a moving-camera background model, which is highly scalable and allows for relatively-free camera motion.

JA-POLS typical results

A detailed description of our method and more example results can be found here:
Paper
Supplemental Material
Example Results

Acknowledgements:
This work was partially funded by the Shulamit Aloni Scholarship from Israel's Ministry of Technology and Science, and by BGU's Hi-Tech Scholarship.

Requirements

  • Python: most of the code runs in python using the following packages: numpy, matlab.engine, scipy, tensorflow, torch, openCV, imageio, scikit-image, and other common python packages.
  • MATLAB (for the SE-Sync part)
  • C++: in case you are choosing the TGA mathod for learning the local subspaces (see module 2 below), please follow the TGA requirements. All steps should be performed in the TGA folder: 2_learning\BG\TGA-PCA.

For a minimal working example, use the Tennis sequence (the input images are already located in the input folder in this repository).

Installation

Instructions and Description

JA-POLS method includes 3 phases that run in separate modules:

  • Joint alignment: align all input images to a common coordinate system
  • Learning of two tasks:
    • Partially-overlapping Local Subspaces (the background)
    • Alignment prediction
  • BG/FG separation for a (previously-unseen) input frame

Configuration parameters: the file config.py includes all required parameters for the 3 modules.

Before start running the code, insert the following config parameter:

Your local path to the JA-POLS folder:

paths = dict(
    my_path = '/PATH_TO_JAPOLS_CODE/JA-POLS/',
)

The size of a single input frame (height, width, depth):

images = dict(
    img_sz = (250, 420, 3),
)

All 3 modules should run from the source folder JA-POLS/.

Module 1: Joint Alignment

Code:
Main function: 1_joint_alignment/main_joint_alignment.py

Input:
A video or a sequence of images, that the BG model will be learned from.
The video or the images should be located in input/learning/video or input/learning/images respectively.

Output:

  • data/final_AFFINE_trans.npy: affine transformations for all input images.
    (In this file, record i contains the affine transformation (6-parameters vector) that is associated with input image i).

Required params in config.py:
Data type (video or a sequence of images), and relevant info about the input data:

se = dict(
    data_type = 'images',  # choose from: ['images', 'video']
    video_name = 'jitter.mp4',  # relevant when data_type = 'video'
    img_type = '*.png',  # relevant when data_type = 'images'
)

Parameters for the spatial transformer net (when estimating the affine transformations):

stn = dict(
    device = '/gpu:0',   # choose from: ['/gpu:0', '/gpu:1', '/cpu:0']
    load_model = False,  # 'False' when learning a model from scratch, 'True' when using a trained network's model
    iter_per_epoch = 2000, # number of iterations 
    batch_size = 10,
)

The rest of the parameters can (optionally) remain with the current configuration.

Description:
Here we solve a joint-alignment problem:




High-level steps:

  1. Compute relative transformations for pairs of input images (according to the graph)
  2. Run SE-Sync framework and get absolute SE transformations for each frame
  3. Transform images according to the absolute SE transformations
  4. Estimate residual affine transformations by optimizing the above loss function using Spatial Transformer Network (STN).
  5. End-up with absolute affine transformations for each of the input images


Module 2: Learning

Code location (main function):
Main function: 2_learning/main_learning.py

Input:
Files that were prepared in module 1:

  • data/final_AFFINE_trans.npy
  • data/imgs.npy
  • data/imgs_big_embd.npy

Output:

  • data/subspaces/: local subspaces for the background learning.
  • 2_learning/Alignment/models/best_model.pt: model of a trained net for the alignment prediction.

Required params in config.py:
Local-subspaces learning:
Method type of the background learning algorithm, that will run on each local domain:

pols = dict(
    method_type = 'PRPCA',  # choose from: [PCA / RPCA-CANDES / TGA / PRPCA]
)

The rest of the parameters can (optionally) remain with the current configuration.

Alignment-prediction learning:
Parameters for the regressor net (when learning a map between images and transformations):

regress_trans = dict(
    load_model = False,  # 'False' when learning a model from scratch, 'True' when using a trained network's model
    gpu_num = 0,  # number of gpu to use (in case there is more than one)
)

The rest of the parameters can (optionally) remain with the current configuration.

Description:
Here we learn two tasks, based on the affine transformations that were learned in module 1:


Module 3: Background/Foreground Separation

Code:
Main function: 3_bg_separation/main_bg_separation.py

Input:
A video or a sequence of test images for BG/FG separation.
The video or the images should be located in input/test/video or input/test/images respectively.

Output:

  • output/bg/: background for each test image.
  • output/fg/: foreground for each test image.
  • output/img/: original test images.

Required params in config.py:
Data type (video or a sequence of test images), and relevant info about the input data:

bg_tool = dict(
    data_type = 'images',  # choose from: ['images', 'video']
    video_name = 'jitter.mp4',  # relevant when data_type = 'video'
    img_type = '*.png',  # relevant when data_type = 'images'
)

Indicate which test images to process: 'all' (all test data), 'subsequence' (subsequence of the image list), or 'idx_list' (a list of specific frame indices (0-based))..
If choosing 'subsequence', insert relevant info in "start_frame" and "num_of_frames".
If choosing 'idx_list', insert a list of indices in "idx_list".

bg_tool = dict(
    which_test_frames='idx_list',  # choose from: ['all', 'subsequence', 'idx_list']
    start_frame=0,
    num_of_frames=20,
    idx_list=(2,15,39),
)

Indicate whether or not to use the ground-truth transformations, in case your process images from the original video.
When processing learning images, insert True.
When processing unseen images, insert False.

bg_tool = dict(
    use_gt_theta = True,
)

The rest of the parameters can (optionally) remain with the current configuration.

Copyright and License

This software is released under the MIT License (included with the software). Note, however, that if you are using this code (and/or the results of running it) to support any form of publication (e.g., a book, a journal paper, a conference paper, a patent application, etc.) then we request you will cite our paper:

 @inproceedings{chelly2020ja,
  title={JA-POLS: a Moving-camera Background Model via Joint Alignment and Partially-overlapping Local Subspaces},
  author={Chelly, Irit and Winter, Vlad and Litvak, Dor and Rosen, David and Freifeld, Oren},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={12585--12594},
  year={2020}
}