GitHub - valeoai/Halton-MaskGIT: Halton Scheduler for Masked Generative Image Transformer

🌟 Halton Scheduler for Masked Generative Image Transformer 🌟

Official PyTorch implementation of the paper:
Halton Scheduler for Masked Generative Image Transformer
Victor Besnier, Mickael Chen, David Hurych, Eduardo Valle, Matthieu Cord
Accepted at ICLR 2025.

TL;DR: We introduce a new sampling strategy using the Halton Scheduler, which spreads tokens uniformly across the image. This approach reduces sampling errors, and improves image quality.

🚀 Overview

Welcome to the official implementation of our ICLR 2025 paper! 🎉

This repository introduces Halton Scheduler for Masked Generative Image Transformer (MaskGIT) and includes:

Class-to-Image Model: Generates high-quality 384x384 images from ImageNet class labels.

Text-to-Image Model: Generates realistic images from textual descriptions (coming soon)

Explore, train, and extend our easy to use generative models! 🚀

The v1.0 version, previously known as "MaskGIT-pytorch" is available here!

📁 Repository Structure

├ Halton-MaskGIT/
|    ├── Congig/                                <- Base config file for the demo
|    |      ├── base_cls2img.yaml                                
|    |      └── base_txt2img.yaml               
|    ├── Dataset/                               <- Data loading utilities
|    |      ├── dataset.py                      <- PyTorch dataset class                   
|    |      └── dataloader.py                   <- PyTorch dataloader
|    ├── launch/                             
|    |      ├── run_cls_to_img.sh               <- Training script for class-to-image   
|    |      └── run_txt_to_img.sh               <- Training script for text-to-image (coming soon) 
|    ├── Metrics/                             
|    |      ├── extract_train_fid.py            <- Precompute FID stats for ImageNet    
|    |      ├── inception_metrics.py            <- Inception score and FID evaluation
|    |      └── sample_and_eval.py              <- Sampling and evaluation
|    ├── Network/                             
|    |      ├── ema.py                          <- EMA model 
|    |      ├── transformer.py                  <- Transformer for class-to-image   
|    |      ├── txt_transformer.py              <- Transformer for text-to-image (coming soon)
|    |      └── va_model.py                     <- VQGAN architecture  
|    ├── Sampler/                             
|    |      ├── confidence_sampler.py           <- Confidence scheduler   
|    |      └── halton_sampler.py               <- Halton scheduler  
|    ├── Trainer/                               <- Training classes
|    |      ├── abstract_trainer.py             <- Abstract trainer     
|    |      ├── cls_trainer.py                  <- Class-to-image trainer     
|    |      └── txt_trainer.py                  <- Text-to-image trainer (coming soon)
|    ├── statics/                               <- Sample images and assets
|    ├── saved_networks/                        <- placeholder for the downloaded models
|    ├── colab_demo.ipynb                       <- Inference demo 
|    ├── app.py                                 <- Gradio example
|    ├── LICENSE.txt                            <- MIT license
|    ├── env.yaml                               <- Environment setup file
|    ├── README.md                              <- This file! 📖
|    └── main.py                                <- Main script

🛠️ Usage

Get started with just a few steps:

1️⃣ Clone the repository

git clone https://github.com/valeoai/Halton-MaskGIT.git
cd Halton-MaskGIT

2️⃣ Install dependencies

conda env create -f env.yaml
conda activate maskgit

3️⃣ Download pretrained models

from huggingface_hub import hf_hub_download
# The VQ-GAN
hf_hub_download(repo_id="FoundationVision/LlamaGen", 
                filename="vq_ds16_c2i.pt", 
                local_dir="./saved_networks/")

# (Optional) The MaskGIT
hf_hub_download(repo_id="llvictorll/Halton-Maskgit", 
                filename="ImageNet_384_large.pth", 
                local_dir="./saved_networks/")

4️⃣ Extract the code from the VQGAN

python extract_vq_features.py --data_folder="/path/to/ImageNet/" --dest_folder="/your/path/" --bsize=256 --compile

5️⃣ Train the model

To train the class-to-image model:

bash launch/run_cls_to_img.sh

📟 Quick Start for sampling

To quickly verify the functionality of our model, you can try this Python code:

import torch
from Utils.utils import load_args_from_file
from Utils.viz import show_images_grid
from huggingface_hub import hf_hub_download

from Trainer.cls_trainer import MaskGIT
from Sampler.halton_sampler import HaltonSampler

config_path = "Config/base_cls2img.yaml"        # Path to your config file
args = load_args_from_file(config_path)
args.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Download the VQGAN from LlamaGen 
hf_hub_download(repo_id="FoundationVision/LlamaGen", 
                filename="vq_ds16_c2i.pt", 
                local_dir="./saved_networks/")

# Download the MaskGIT
hf_hub_download(repo_id="llvictorll/Halton-Maskgit", 
                filename="ImageNet_384_large.pth", 
                local_dir="./saved_networks/")

# Initialisation of the model
model = MaskGIT(args)

# select your scheduler
sampler = HaltonSampler(sm_temp_min=1, sm_temp_max=1.2, temp_pow=1, temp_warmup=0, w=2,
                        sched_pow=2, step=32, randomize=True, top_k=-1)

# [goldfish, chicken, tiger cat, hourglass, ship, dog, race car, airliner]
labels = [1, 7, 282, 604, 724, 179, 751, 404] 

gen_images = sampler(trainer=model, nb_sample=8, labels=labels, verbose=True)[0]
show_images_grid(gen_images)

or run the gradio 🖼️ app.py --> python app.py and connect to http://127.0.0.1:6006 on your navigator

🎨 Want to try the model, but you don't have a gpu? Check out the Colab Notebook for an easy-to-run demo!

🧠 Pretrained Models

The pretrained MaskGIT models are available on Hugging Face. Use them to jump straight into inference or fine-tuning.

Model	# Params	# Input	# GFLOP	VQGAN	MaskGIT
Halton-MaskGIT-Large	480M	24x24	83.00	🔗 Download	🔗 Download

❤️ Contribute

We welcome contributions and feedback! 🛠️ If you encounter any issues, have suggestions, or want to collaborate, feel free to:

Create an issue
Fork the repository and submit a pull request

Your input is highly valued. Let’s make this project even better together! 🙌

📜 License

This project is licensed under the MIT License. See the LICENSE file for details.

🙏 Acknowledgments

We are grateful for the support of the IT4I Karolina Cluster in the Czech Republic for powering our experiments.

The pretrained VQGAN ImageNet (f=16/8, 16384 codebook) is from the LlamaGen official repository

📖 Citation

If you find our work useful, please cite us and add a star ⭐ to the repository :)

@inproceedings{besnier2025iclr,
  title={Halton Scheduler for Masked Generative Image Transformer},
  author={Victor Besnier, Mickael Chen, David Hurych, Eduardo Valle, Matthieu Cord},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟 Halton Scheduler for Masked Generative Image Transformer 🌟

🚀 Overview

📁 Repository Structure

🛠️ Usage

1️⃣ Clone the repository

2️⃣ Install dependencies

3️⃣ Download pretrained models

4️⃣ Extract the code from the VQGAN

5️⃣ Train the model

📟 Quick Start for sampling

🧠 Pretrained Models

❤️ Contribute

📜 License

🙏 Acknowledgments

📖 Citation

⭐ Stars History

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Config		Config
Dataset		Dataset
Metrics		Metrics
Network		Network
Sampler		Sampler
Trainer		Trainer
Utils		Utils
launch		launch
results		results
saved_networks		saved_networks
statics		statics
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
__init__.py		__init__.py
app.py		app.py
colab_demo.ipynb		colab_demo.ipynb
env.yaml		env.yaml
extract_train_fid.py		extract_train_fid.py
extract_vq_features.py		extract_vq_features.py
main.py		main.py

License

valeoai/Halton-MaskGIT

Folders and files

Latest commit

History

Repository files navigation

🌟 Halton Scheduler for Masked Generative Image Transformer 🌟

🚀 Overview

📁 Repository Structure

🛠️ Usage

1️⃣ Clone the repository

2️⃣ Install dependencies

3️⃣ Download pretrained models

4️⃣ Extract the code from the VQGAN

5️⃣ Train the model

📟 Quick Start for sampling

🧠 Pretrained Models

❤️ Contribute

📜 License

🙏 Acknowledgments

📖 Citation

⭐ Stars History

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages