Code for Distance-Aware Occlusion Detection with Focused Attention
conda create --name ENVIRONMENT_NAME python=3.8.10
conda activate ENVIRONMENT_NAME
pip install -r requirements.txt
The 2.5VRD dataset needs to be placed in the following way. The three .odgt files listed below need to be generated using our write annotations.ipynb.
Project Root/
└── data/
└── 2.5vrd/
└──images/
├── train/
├── validation/
├── test/
├── annotation_train_combined.odgt
├── annotation_valid_combined.odgt
├── annotation_test_combined.odgt
└── write annotations.ipynb
Model | Distance F1-Score | Occlusion F1-Score | checkpoint |
---|---|---|---|
GIT | 0.3857 | 0.4124 | GIT.pth |
GIT, no intersection loss when no intersection exists | 0.3788 | 0.4050 | GIT_do_not_calculate_intersection_loss_if_no_intersection_exists.pth |
without GIT | 0.3710 | 0.3995 | no_GIT.pth |
The argument --resume is the path of the model checkpoint to be evaluated. Evaluation out will be written to tensorboard. The experiment name is the name of the folder that is under output_dir and contains the checkpoint. For example, the sample evaluation script would result in an experiment name of "Dec28_Cascade_633".
Users need to modify dec_layers, dec_layers_distance, dec_layers_occlusion to the correct number of transformer decoder layers in the object pair decoder, distance decoder, and occlusion decoder, respectively.
Before running, in the unmodified magic_numbers.py, set:
PREDICT_INTERSECTION_BOX = True
python vrd_test.py --backbone=resnet101 --resume='output_dir/GIT/GIT.pth' --dec_layers=6 --dec_layers_distance=3 --dec_layers_occlusion=3 --num_workers=0 --batch_size=1
Before running, in the unmodified magic_numbers.py, make sure:
PREDICT_INTERSECTION_BOX = False
python vrd_test.py --backbone=resnet101 --resume='output_dir/no_GIT/no_GIT.pth' --dec_layers=6 --dec_layers_distance=3 --dec_layers_occlusion=3 --num_workers=0 --batch_size=1
After generating predictions, need to evaluate them and write evaluation outputs to tensorboard.
Use the evaluation scripts provided by the authors of 2.5VRD to evaluate the performance of our models.
The experiment_name, output_dir, dec_layers, dec_layers_distance, dec_layers_occlusion arguments can be changed.
- experiment_name: the experiment name in tensorboard.
- output_dir: the folder to store checkpoints.
- dec_layers, dec_layers_distance, dec_layers_occlusion corresponds to the number of transformer decoder layers in the object pair decoder, distance decoder, and occlusion decoder, respectively.
Before running, in the unmodified magic_numbers.py, set:
PREDICT_INTERSECTION_BOX = True
# 8 GPUs (40G GPU memory per GPU)
torchrun --nnodes=1 --nproc_per_node=8 --master_port=54321 main.py --num_workers=8 --epochs=500 --dataset_file=two_point_five_vrd --batch_size=6 --backbone=resnet101 --lr=0.0001 --dec_layers=6 --dec_layers_distance=3 --dec_layers_occlusion=3 --experiment_name='runs/debug' --output_dir='output_dir/debug' --lr_drop=30
# 4 GPUs (80G GPU memory per GPU)
torchrun --nnodes=1 --nproc_per_node=4 --master_port=54321 main.py --num_workers=4 --epochs=500 --dataset_file=two_point_five_vrd --batch_size=12 --backbone=resnet101 --lr=0.0001 --dec_layers=6 --dec_layers_distance=3 --dec_layers_occlusion=3 --experiment_name='runs/debug' --output_dir='output_dir/debug' --lr_drop=30
Before running, in the unmodified magic_numbers.py, make sure:
PREDICT_INTERSECTION_BOX = False
# 8 GPUs (40G GPU memory per GPU)
torchrun --nnodes=1 --nproc_per_node=8 --master_port=54322 main.py --num_workers=8 --epochs=500 --dataset_file=two_point_five_vrd --batch_size=6 --backbone=resnet101 --lr=0.0001 --dec_layers=6 --dec_layers_distance=3 --dec_layers_occlusion=3 --experiment_name='runs/debug' --output_dir='output_dir/debug' --lr_drop=30
# 4 GPUs (80G GPU memory per GPU)
torchrun --nnodes=1 --nproc_per_node=4 --master_port=54322 main.py --num_workers=4 --epochs=500 --dataset_file=two_point_five_vrd --batch_size=12 --backbone=resnet101 --lr=0.0001 --dec_layers=6 --dec_layers_distance=3 --dec_layers_occlusion=3 --experiment_name='runs/debug' --output_dir='output_dir/debug' --lr_drop=30
# 8 GPUs (40G GPU memory per GPU)
torchrun --nnodes=1 --nproc_per_node=8 --master_port=54323 main.py --num_workers=8 --epochs=500 --dataset_file=two_point_five_vrd --batch_size=6 --backbone=resnet101 --lr=0.0001 --dec_layers=6 --experiment_name='runs/debug' --output_dir='output_dir/debug' --lr_drop=30
# 4 GPUs (80G GPU memory per GPU)
torchrun --nnodes=1 --nproc_per_node=4 --master_port=54323 main.py --num_workers=4 --epochs=500 --dataset_file=two_point_five_vrd --batch_size=12 --backbone=resnet101 --lr=0.0001 --dec_layers=6 --experiment_name='runs/debug' --output_dir='output_dir/debug' --lr_drop=30
Firstly, set SORT_USING_OBJECT_SCORES = True in magic_numbers.py
Before running, in the unmodified magic_numbers.py, set:
VISUALIZE_ATTENTION_WEIGHTS = True
PREDICT_INTERSECTION_BOX = True
python vrd_test.py --backbone=resnet101 --resume='output_dir/GIT/GIT.pth' --dec_layers=6 --dec_layers_distance=3 --dec_layers_occlusion=3 --num_workers=0 --batch_size=1
Running this would save the attention weights of the model that was trained with the generalized intersection prediction task to disk.
Before running, in the unmodified magic_numbers.py, set:
VISUALIZE_ATTENTION_WEIGHTS = True
python vrd_test.py --backbone=resnet101 --resume='output_dir/no_GIT/no_GIT.pth' --dec_layers=6 --dec_layers_distance=3 --dec_layers_occlusion=3 --num_workers=0 --batch_size=1
Running this would save the attention weights of the model trained without the generalized intersection prediction task to disk.
Use a jupyter notebook provided by us to visualize attention weights saved to disk in the previous steps.
Place this notebooks under project root and follow the instructions in it to visualize decoder attentions:
cleaned_visualize_attention_(GIT).ipynb
CUDA_VISIBLE_DEVICES=0 python -m pdb main.py --num_workers=0 --epochs=500 --dataset_file=two_point_five_vrd --batch_size=6 --backbone=resnet101 --lr=0.0001 --dec_layers=6 --dec_layers_distance=3 --dec_layers_occlusion=3 --experiment_name='runs/debug' --output_dir='output_dir/debug' --lr_drop=30
@ARTICLE{9864137, author={Li, Yang and Tu, Yucheng and Chen, Xiaoxue and Zhao, Hao and Zhou, Guyue}, journal={IEEE Transactions on Image Processing}, title={Distance-Aware Occlusion Detection with Focused Attention}, year={2022}, volume={}, number={}, pages={1-1}, doi={10.1109/TIP.2022.3197984}}