This repository contains the official codes for TransVOS: Video Object Setmentation with Transformers.
- torch >= 1.6.0
- torchvison >= 0.7.0
- ...
To installl requirements, run:
conda env update -n TransVOS --file requirements.yaml
We follow AFB-URR to convert static images (MSRA10K, ECSSD, PASCAL-S, PASCAL VOC2012, COCO) into a uniform format (followed DAVIS).
Download the YouTube-VOS dataset, then organize data as following format:
YTBVOS
|----train
| |-----JPEGImages
| |-----Annotations
| |-----meta.json
|----valid
| |-----JPEGImages
| |-----Annotations
| |-----meta.json
Where JPEGImages
and Annotations
contain the frames and annotation masks of each video.
Download the DAVIS17 datasets, then organize data as following format:
DAVIS
|----JPEGImages
| |-----480p
|----Annotations
| |-----480p (annotations for DAVIS 2017)
|----ImageSets
| |-----2016
| |-----2017
|----DAVIS-test-dev (data for DAVIS 2017 test-dev)
To pretrain the TransVOS network on static images, modify the dataset root ($cfg.DATA.PRETRAIN_ROOT
) in config.py
, then run following command.
python train.py --gpu ${GPU-IDS} --exp_name ${experiment} --pretrain
To train the TransVOS network on DAVIS & YouTube-VOS, modify the dataset root ($cfg.DATA.DAVIS_ROOT
, $cfg.DATA.YTBVOS_ROOT
) in config.py
, then run following command.
python train.py --gpu ${GPU-IDS} --exp_name ${experiment} --initial ${./checkpoints/*.pth.tar}
Download the pretrained DAVIS17 checkpoint and YouTube-VOS checkpoint.
To eval the TransVOS network on (DAVIS16/17), modify $cfg.DATA.VAL.DATASET_NAME
, then run following command
python eval.py --checkpoint ${./checkpoints/*.pth.tar}
To test the TransVOS network on (DAVIS17 test-dev/youTube-vos), modify $cfg.DATA.TEST.DATASET_NAME
, then run following command
python test.py --checkpoint ${./checkpoints/*.pth.tar}
The test results will be saved as indexed png file at ${results}/
.
Additionally, you can modify some setting parameters in config.py
to change configuration.
This codebase is built upon official AFB-URR repository and official DETR repository.
@article{mei2021transvos,
title={TransVOS: Video Object Segmentation with Transformers},
author={Mei, Jianbiao and Wang, Mengmeng and Lin, Yeneng and Liu, Yong},
journal={arXiv preprint arXiv:2106.00588},
year={2021}
}