This is the implementation of our paper: Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation that has been accepted to IEEE International Conference on Computer Vision (ICCV) 2023.
conda create -n VIPMT python=3.6
conda activate VIPMT
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch
conda install opencv cython
pip install easydict imgaug
- Download the 2019 version of Youtube-VIS dataset.
- Download VSPW 480P dataset.
- Put the dataset in the
./data
folder.
data
└─ Youtube-VOS
└─ train
└─ Annotations
└─ JPEGImages
└─ train.json
└─ VSPW_480p
└─ data
- Install cocoapi for Youtube-VIS.
- Download the ImageNet pretrained backbone and put it into the
pretrain_model
folder.
pretrain_model
└─ resnet50_v2.pth
- Update
config/config.py
.
python train.py --group 1 --batch_size 4
python test.py --group 1
Part of the code is based upon: IPMT, DANet. Thanks for their great work!