This repository contains the official codes for TransVOS: Video Object Setmentation with Transformers.
- torch >= 1.6.0
- torchvison >= 0.7.0
- ...
To installl requirements, run:
conda env update -n TransVOS --file requirements.yaml
We follow AFB-URR to convert static images (MSRA10K, ECSSD, PASCAL-S, PASCAL VOC2012, COCO) into a uniform format (followed DAVIS).
Download the YouTube-VOS dataset, then organize data as following format:
| |-----JPEGImages
| |-----Annotations
| |-----meta.json
| |-----JPEGImages
| |-----Annotations
| |-----meta.json
Where JPEGImages
and Annotations
contain the frames and annotation masks of each video.
Download the DAVIS17 datasets, then organize data as following format:
| |-----480p
| |-----480p (annotations for DAVIS 2017)
| |-----2016
| |-----2017
|----DAVIS-test-dev (data for DAVIS 2017 test-dev)
To pretrain the TransVOS network on static images, modify the dataset root ($cfg.DATA.PRETRAIN_ROOT
) in
, then run following command.
python --gpu ${GPU-IDS} --exp_name ${experiment} --pretrain
To train the TransVOS network on DAVIS & YouTube-VOS, modify the dataset root ($cfg.DATA.DAVIS_ROOT
) in
, then run following command.
python --gpu ${GPU-IDS} --exp_name ${experiment} --initial ${./checkpoints/*.pth.tar}
Download the pretrained DAVIS17 checkpoint and YouTube-VOS checkpoint.
To eval the TransVOS network on (DAVIS16/17), modify $cfg.DATA.VAL.DATASET_NAME
, then run following command
python --checkpoint ${./checkpoints/*.pth.tar}
To test the TransVOS network on (DAVIS17 test-dev/youTube-vos), modify $cfg.DATA.TEST.DATASET_NAME
, then run following command
python --checkpoint ${./checkpoints/*.pth.tar}
The test results will be saved as indexed png file at ${results}/
Additionally, you can modify some setting parameters in
to change configuration.
This codebase is built upon official AFB-URR repository and official DETR repository.
title={TransVOS: Video Object Segmentation with Transformers},
author={Mei, Jianbiao and Wang, Mengmeng and Lin, Yeneng and Liu, Yong},
journal={arXiv preprint arXiv:2106.00588},