ICCV 2023, official code implementation, arXiv
Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly challenging for two reasons. First, it suffers from a long-tail problem in its relation categories, making naive biased methods more inclined to high-frequency relations. Existing unbiased methods tackle the long-tail problem by data/loss rebalancing to favor low-frequency relations. Second, a subject-object pair can have two or more semantically overlapping relations. While existing methods favor one over the other, our proposed HiLo framework lets different network branches specialize on low and high frequency relations, enforce their consistency and fuse the results. To the best of our knowledge we are the first to propose an explicitly unbiased PSG method. In extensive experiments we show that our HiLo framework achieves state-of-the-art results on the PSG task. We also apply our method to the Scene Graph Generation task that predicts boxes instead of masks and see improvements over all baseline methods.
An overview of our HiLo framework with HiLo baseline. a) HiLo relation swapping module swaps the multiple relations in the subject-object pair to obtain H-L Data and L-H Data respectively. b) Input data into our HiLo framework with HiLo baseline model, there are two branches, namely H-L decoder and L-H decoder, which learn H-L Data and L-H Data respectively. c) In addition to task losses for PSG, we propose HiLo prediction alignment, which includes subject-object consistency loss and relation consistency loss, so that the parallel branch can be better optimized.
Comparison between our HiLo and other methods on the PSG dataset. Our method shows superior performance compared to all previous methods.
Visualization of panoptic segmentations and the top 20 predicted triplets compared with ground truth. The upper left is the original image, the lower left is the ground truth and on the right are the predictions. The highlighted triplets represent the subject-object pairs with multiple relations, where the blue highlights represent the high frequency relations and the red highlights represents the low frequency relations. The visualization results show that our method can predict both high frequency and low frequency relations.
Dev environment:
git clone https://github.com/franciszzj/HiLo.git
cd HiLo
conda create --name hilo --file spec-file.txt
conda activate hilo
Please install mmcv==v1.7.0
and mmdet==v2.25.2
.
Pretrained models are directly converted from Mask2Former using this code.
python tools/change_model.py path/to/pretrained/model
Config path: ./configs/psgmask2former/
- R50:
psgmask2former_r50_hilo_baseline.py
,psgmask2former_r50_hilo.py
- Swin Base:
psgmask2former_swin_b_hilo_baseline.py
,psgmask2former_swin_b_hilo.py
- Swin Large:
psgmask2former_swin_l_hilo_baseline.py
,psgmask2former_swin_l_hilo.py
Hyperparameter:
- EVAL_PAN_RELS: For details, refer to issue#30, issue#60, and issue#100.
- model.bbox_head.test_forward_output_type:
'high2low'
,'low2high'
, and'merge'
.
Train HiLo baseline:
PYTHONPATH='.':$PYTHONPATH \
EVAL_PAN_RELS=True \
python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
tools/train.py path/to/hilo_baseline/config --auto-resume --no-validate --seed 666 --launcher pytorch
Obtaining a new training file through IETrans:
Note: you should also add gt_xxx
in the test_pipeline
. You can refer to example_config for specifics.
PYTHONPATH='.':$PYTHONPATH \
python tools/data_prepare/ietrans.py path/to/hilo_baseline/config path/to/checkpoint path/to/output
Train HiLo:
PYTHONPATH='.':$PYTHONPATH \
EVAL_PAN_RELS=True \
python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
tools/train.py path/to/hilo/config --auto-resume --no-validate --seed 666 --launcher pytorch
Test and eval HiLo baseline:
PYTHONPATH='.':$PYTHONPATH \
EVAL_PAN_RELS=True \
python tools/test.py path/to/hilo_baseline/config path/to/checkpoint --eval sgdet_PQ
Test and eval HiLo:
PYTHONPATH='.':$PYTHONPATH \
EVAL_PAN_RELS=True \
python tools/test.py path/to/hilo/config path/to/checkpoint --eval sgdet_PQ --cfg-options model.bbox_head.test_forward_output_type='merge'
For the convenience to follow HiLo, we provide the PSG json file processed through IETrans, as well as a trained model and the config file saved from the training process for reference.
Note:
- For the R50 model, we used
use_shared_query=True
. However, after multiple experiments, we found that the results foruse_shared_query=True/False
are similar. Therefore, we did not provide an R50 model withuse_shared_query=False
. While for the SwinB/SwinL models, they areuse_shared_query=False
models. - The results reported in the paper are with
EVAL_PAN_RELS=False
, for a fairer comparison with methods like PSGTR. However, we have implemented a more efficient post-processing method, where the performance withEVAL_PAN_RELS=True
is similar to that withEVAL_PAN_RELS=False
.
Backbone | PSG file (IETrans processed) | Converted Mask2Former | HiLo Baseline Model | Config (for HiLo train) | HiLo Model |
---|---|---|---|---|---|
R50 | psg_ietrans.json | mask2former_r50_converted.pth | hilo_baseline_r50.pth | hilo_r50.py | hilo_r50.pth |
SwinB | psg_ietrans_swin_b.json | mask2former_swin_b_converted.pth | hilo_baseline_swin_b.pth | hilo_swin_b.py | hilo_swin_b.pth |
SwinL | psg_ietrans_swin_l.json | mask2former_swin_l_converted.pth | hilo_baseline_swin_l.pth | hilo_swin_l.py | hilo_swin_l.pth |
HiLo is developed based on OpenPSG and MMDetection. Thanks for their great works!
If you find this repository useful, please cite:
@InProceedings{zhou2023hilo,
author = {Zhou, Zijian and Shi, Miaojing and Caesar, Holger},
title = {HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {21637-21648}
}