This is the repository for Object Discovery from Motion-Guided Tokens, published at CVPR 2023.
[Project Page] [Paper]
Python 3 dependencies:
- torch 1.7.1+CUDA11.0
- matplotlib
- cv2
- numpy
- scipy
- tqmd
MOVi-E dataset can be accessed from the official repo. After downloading, we save the data to npy files for training. See process_movi.py
for details.
PD datasets (RGB, flow, depth, semantic masks) and additional annotations (moving object masks; dynamic object masks): [TRI-PD dataset]. The "simplified" folder contains flow (both forward and backward), rgb, and motion masks. The "full" folder contains additional RGB, depth, flow, motion masks, and semantic masks.
Raw PD dataset (which contains RGB, semantic segmentation, instance segmentation, optical flow, depth, camera colibrations, 2D/3D bounding boxes, etc.) is connected to TRI's Vidar project. Leave a message in the issues or contact [email protected]
for the annotations other than the simplified ones.
Sample code to transfer the motion vectors to flow xy:
rgba = cv2.imread('x.png',-1)
r,g,b,a = rgba[:,:,0], rgba[:,:,1], rgba[:,:,2], rgba[:,:,3]
h,w,_ = flow.shape
dx_i = r+g*256
dy_i = b+a*256
flow_x = ((dx_i / 65535.0)*2.0-1.0) * w
flow_y = ((dy_i / 65535.0)*2.0 - 1.0) * h
To download the files in google drive from a server, please check gdown. Some sample code to download the files in the folder:
import gdown
url = "https://drive.google.com/drive/folders/1q5AjqhoivJb67h9MZCgUtqb4CooDrZhC"
gdown.download_folder(url, quiet = False, use_cookies = False)
KITTI dataset can be downloaded from the offcial website we use all the RGB images for training. The motion segmentations we used can be downloaded from here
MOVI
root
- train
- video-0000
- rgb.npy
- forward_flow.npy
- backward_flow.npy
- depth.npy
- segment.npy
- video-0001
- ...
- val
- test
TRI-PD
root
- scene_000001
- rgb
- camera_01
- 000000000000000005.png
- ...
- camera_04
- camera_05
- camera_06
- camera_07
- camera_08
- camera_09
- motion_vectors_2d
- back_motion_vectors_2d
- moving_masks
- ari_masks
- est_masks
- scene_000003
- ...
KITTI
root
- 2011_09_26_drive_0001_sync
- image_02
- data
- 0000000000.png
- 0000000001.png
- ...
- raft_seg
- 0000000000.png
- 0000000001.png
- ...
- image_03
- data
- raft_seg
- 2011_09_26_drive_0002_sync
- ...
See trainPD.sh
, trainKITTI.sh
and trainMOVI.sh
for sample training scripts. See args in the training python scripts for details.
To evaluate or infer on the test set, first download the pre-trained model (or train it with the training code), then run
python eval(movi/pd/kitti).py
Notice that we provide the version without motion cue on MOVi-E and with motion cue on TRI-PD and KITTI.
To infer and visualize on a video of arbitary length, see
Plot.py
for a sample code.
Pre-trained models are located in the pre-trained models folder in this drive.
In this repo, we mainly provide the architecture for VQ-space + perceiver decoder. More implementations about different choices of decoders and reconstruction space shown in our paper can be found in the folder others
.
The slot attention modules is referred to the pytorch slot attention and the official google repo, the estimated motion segments are generated by Towards segmenting anything that moves repo.
For the estimated annotation generations, we use smurf and Vidar.
Discovering objects that can move
@inproceedings{bao2022discovering,
Author = {Bao, Zhipeng and Tokmakov, Pavel and Jabri, Allan and Wang, Yu-Xiong and Gaidon, Adrien and Hebert, Martial},
Title = {Discorying Object that Can Move},
Booktitle = {CVPR},
Year = {2022},
}
@inproceedings{bao2023object,
Author = {Bao, Zhipeng and Tokmakov, Pavel and Wang, Yu-Xiong and Gaidon, Adrien and Hebert, Martial},
Title = {Object Discovery from Motion-Guided Tokens},
Booktitle = {CVPR},
Year = {2023},
}