Skip to content

Epiphqny/PAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallelized Autoregressive Visual Generation

arXiv  project page 

image

BibTeX

@article{wang2024parallelized,
  title={Parallelized Autoregressive Visual Generation},
  author={Wang, Yuqing and Ren, Shuhuai and Lin, Zhijie and Han, Yujin and Guo, Haoyuan and Yang, Zhenheng and Zou, Difan and Feng, Jiashi and Liu, Xihui},
  journal={arXiv preprint arXiv:2412.15119},
  year={2024}
}

Getting Started

Requirements

  • Linux with Python ≥ 3.7
  • PyTorch ≥ 2.1
  • A100 GPUs

We use the same environment as LLamaGen. For more details, please refer to here.

VQ-VAE models

Method params tokens rFID (256x256) weight
vq_ds16_c2i 72M 16x16 2.19 vq_ds16_c2i.pt

AR models

Method params tokens FID (256x256) weight
PAR-XL-4x 775M 24x24 2.61 PAR-XL-4x.pt
PAR-XXL-4x 1.4B 24x24 2.35 PAR-XXL-4x.pt
PAR-3B-4x 3.1B 24x24 2.29 PAR-3B-4x.pt
PAR-3B-16x 3.1B 24x24 2.88 PAR-3B-16x.pt

Please download the above models, put them in the folder ./pretrained_models

Pre-extract discrete codes of training images

bash scripts/autoregressive/extract_codes_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --data-path /path/to/imagenet/train --code-path /path/to/imagenet_code_c2i_flip_ten_crop --ten-crop --crop-range 1.1 --image-size 384

Train AR models with DDP

Before running, please change nnodes, nproc_per_node, node_rank, master_addr, master_port in .sh. The spe-token-num and ar-token-num represent the number of learnable tokens (n-1) and the number of tokens for parallel generation (n), respectively.

bash scripts/autoregressive/train_c2i.sh --cloud-save-path /path/to/cloud_disk --code-path /path/to/imagenet_code_c2i_flip_ten_crop --spe-token-num 3 --ar-token-num 4 --image-size 384 --gpt-model GPT-XL

bash scripts/autoregressive/train_c2i.sh --cloud-save-path /path/to/cloud_disk --code-path /path/to/imagenet_code_c2i_flip_ten_crop --spe-token-num 3 --ar-token-num 4 --image-size 384 --gpt-model GPT-XXL

bash scripts/autoregressive/train_c2i.sh --cloud-save-path /path/to/cloud_disk --code-path /path/to/imagenet_code_c2i_flip_ten_crop --spe-token-num 3 --ar-token-num 4 --image-size 384 --gpt-model GPT-3B

Sampling


bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-3B-4x.pt --spe-token-num 3 --ar-token-num 4 --gpt-model GPT-3B --image-size 384 --image-size-eval 256 --cfg-scale 1.345

bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-1B-4x.pt --spe-token-num 3 --ar-token-num 4 --gpt-model GPT-XXL --image-size 384 --image-size-eval 256 --cfg-scale 1.435

bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-XL-4x.pt --spe-token-num 3 --ar-token-num 4 --gpt-model GPT-XL --image-size 384 --image-size-eval 256 --cfg-scale 1.5

bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-3B-16x.pt --spe-token-num 15 --ar-token-num 16 --gpt-model GPT-3B --image-size 384 --image-size-eval 256 --cfg-scale 1.5

Evaluation

Before evaluation, please refer evaluation readme to install required packages.

python3 evaluations/c2i/evaluator.py VIRTUAL_imagenet256_labeled.npz samples/GPT-XXL-PAR-XXL-4x-size-384-size-256-VQ-16-topk-0-topp-1.0-temperature-1.0-cfg-1.435-seed-0.npz

Acknowledgments

The development of PAR is based on LlamaGen. We deeply appreciate this contribution to the community.

About

The official implementation of PAR: Parallelized Autoregressive Visual Generation. https://epiphqny.github.io/PAR-project/

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published