Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation.
Current weakly-supervised incremental learning for semantic segmentation (WILSS) approaches only consider replacing pixel-level annotations with image-level labels, while the training images are still from well-designed datasets. In this work, we argue that widely available web images can also be considered for the learning of new classes. To achieve this, firstly we introduce a strategy to select web images which are similar to previously seen examples in the latent space using a Fourier-based domain discriminator. Then, an effective caption-driven reharsal strategy is proposed to preserve previously learnt classes. To our knowledge, this is the first work to rely solely on web images for both the learning of new concepts and the preservation of the already learned ones in WILSS. Experimental results show that the proposed approach can reach state-of-the-art performances without using manually selected and annotated data in the incremental steps.
the code is based on the WILSON.
We have simple requirements: The main requirements are:
python > 3.1
pytorch > 1.6
If you want to install a custom environment for this code, you can run the following using conda:
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
conda install tensorboard
conda install jupyter
conda install matplotlib
conda install tqdm
conda install imageio
pip install inplace-abn # this should be done using CUDA compiler (same version as pytorch)
pip install wandb # to use the WandB logger
To download dataset, follow the scripts: data/download_voc.sh
, data/download_coco.sh
If your datasets are in a different folder, make a soft-link from the target dataset to the data folder. We expect the following tree:
data/voc/
SegmentationClassAug/
<Image-ID>.png
JPEGImages/
<Image-ID>.png
split/
... other files
voc
directory of the repository.
We suggest to link the folders inside the voc directory.
We use pascal web data to preserve previous knowledge and class web data for learning new knowledge. They have been uploaded to Baidu Drive:
Pascal web data
https://pan.baidu.com/s/1W19rDo9CWUU4_zJ3pFrSeQ?pwd=1234
pswd: 1234
Class web data
https://pan.baidu.com/s/1UeiL8cWjQvtWN68N2g00TA?pwd=1234
pswd: 1234
Usage
For the usage of pascal web data, 1)first we generate the new caption based on those web data, 2)then a cosine similarity score is computed for selecting web images that are more similar to the original one.
3)Once finish the filtering, "generate_label" file and "generate_train_file" file would help us to train main model with selected web data:
Process
1): We offer our caption files generated by LLM openflamingo, you could find it in "captions" folder.
2): By running "filter_replay_with_NN" and "move_with_filtered_caption" function in "utils/measure_noun.py" respectively, you will get the selected caption and image data.
3): Run "generate_label.py" to get the pseudo annotation for web data and run "generate_train_file.py" for generating the path file.
4): Set "--replay_path" to your web data path and now you can train your model with web data.
Selected web data for 10-10 and 15-5
In case you have trouble in filtering web data, we also provide our selected data so that you can directly train your model without any filtering strategy.
15-5-ov:
google drive
10-10-ov:
google drive
Usage: Download the data, unizp and set the"--replay_path" to the path just unzipped.
After setting the dataset, you download the models pretrained on ImageNet using InPlaceABN.
Download the ResNet-101 model (we only need it but you can also download other networks if you want to change it).
Then, put the pretrained model in the pretrained
folder.
We provide different an example script to run the experiments (see run.sh
).
In the following, we describe the basic parameter to run an experiment.
First, we assume that we have a command
exp='python -m torch.distributed.launch --nproc_per_node=<num GPUs> --master_port <PORT> run.py --num_workers <N_Workers>'`
that allow us to setup the distributed data parallel script.
The first to replicate us, is to obtain the model on the step 0 (base step, fully supervised). You can run:
exp --name Base --step 0 --lr 0.01 --bce --dataset <dataset> --task <task> --batch_size 24 --epochs 30 --val_interval 2 [--overlap]
where we use --bce
to train the classifier with the binary cross-entropy. dataset
can be voc
or coco-voc
. The task
are,
voc: (you can set overlap here)
15-5, 10-10
coco: (overlap is not used)
voc
After this, you can run the incremental steps using only image level labels (set the weakly
parameter).
exp --name ours --step 1 --weakly --lr 0.001 --alpha 0.5 --step_ckpt <pretr> --loss_de 1 --lr_policy warmup --affinity \
--dataset <dataset> --task <task> --batch_size 24 --epochs 40 [--overlap] --replay --replay_path <your replay data path> --replay_num 2
where pretr
should be the path to the pretrained model (usually checkpoints/step/<dataset>-<task>/<name>.pth
).