Official PyTorch implementation of of TwFA.
Modeling Image Composition for Complex Scene Generation (CVPR2022)
Zuopeng Yang, Daqing Liu, Chaoyue Wang, Jie Yang, Dacheng Tao
The overview of the proposed Transformer with Focal Attention (TwFA) framework.
The illustration of different attention mechanisms with connectivity matrix.
A suitable conda environment named twfa
can be created
and activated with:
conda env create -f environment.yaml
conda activate twfa
Create a symlink data/coco
containing the images from the 2017 split in
train2017
and val2017
, and their annotations in annotations
. Files can be
obtained from the COCO webpage.
Create a symlink data/vg
containing the images from Visual Genome. Files can be
obtained from the VG webpage. Unzip the other annotations for VG in the dir data
.
Download the checkpoint (code: 5ipt) and place it into the dir pretrained/checkpoints
. Then run the command:
python scripts/sample_coco.py --base configs/coco.yaml --save_path SAVE_DIR
Download the checkpoint1 (code: 1gzu) or checkpoint2 (code: t1qv) and place it into the dir pretrained/checkpoints
. Then run the command:
python scripts/sample_vg.py --base configs/VG_CONFIG_FILE --save_path SAVE_DIR
python main.py --base configs/coco.yaml -t True --gpus 0,1,2,3,4,5,6,7,
python main.py --base configs/vg.yaml -t True --gpus 0,1,2,3,4,5,6,7,
Huge thanks to the Taming-Transformers!
@misc{esser2020taming,
title={Taming Transformers for High-Resolution Image Synthesis},
author={Patrick Esser and Robin Rombach and Björn Ommer},
year={2020},
eprint={2012.09841},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@inproceedings{yang2022modeling,
title={Modeling image composition for complex scene generation},
author={Yang, Zuopeng and Liu, Daqing and Wang, Chaoyue and Yang, Jie and Tao, Dacheng},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7764--7773},
year={2022}
}