Skip to content

The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing"

License

Notifications You must be signed in to change notification settings

TencentARC/BrushEdit

Repository files navigation

BrushEdit

😃 This repository contains the implementation of "BrushEdit: All-In-One Image Inpainting and Editing".

Keywords: Image Inpainting, Image Generation, Image Editing, Diffusion Models, MLLM Agent, Instruction-basd Editing

TL;DR: BrushEdit is an advanced, unified AI agent for image inpainting and editing.
Main Elements: 🛠️ Fully automated / 🤠 Interactive editing.

Yaowei Li1*, Yuxuan Bian3*, Xuan Ju3*, Zhaoyang Zhang2‡, Junhao Zhuang4, Ying Shan2✉, Yuexian Zou1✉
, Qiang Xu3✉
1Peking University 2ARC Lab, Tencent PCG 3The Chinese University of Hong Kong 4Tsinghua University
*Equal Contribution Project Lead Corresponding Author

🌐Project Page | 📜Arxiv | 📹Video | 🤗Hugging Face Demo | 🤗Hugging Model |

1214_BrushEdit_480_60FPS_release.mp4

4K HD Introduction Video: Youtube.

📖 Table of Contents

TODO

  • Release the code of BrushEdit. (MLLM-dirven Agent for Image Editing and Inpainting)
  • Release the paper and webpage. More info: BrushEdit
  • Release the BrushNetX checkpoint(a more powerful BrushNet).
  • Release gradio demo.

🛠️ Pipeline Overview

BrushEdit consists of four main steps: (i) Editing category classification: determine the type of editing required. (ii) Identification of the primary editing object: Identify the main object to be edited. (iii) Acquisition of the editing mask and target Caption: Generate the editing mask and corresponding target caption. (iv) Image inpainting: Perform the actual image editing. Steps (i) to (iii) utilize pre-trained MLLMs and detection models to ascertain the editing type, target object, editing masks, and target caption. Step (iv) involves image editing using the dual-branch inpainting model improved BrushNet. This model inpaints the target areas based on the target caption and editing masks, leveraging the generative potential and background preservation capabilities of inpainting models.

teaser

🚀 Getting Started

Environment Requirement 🌍

BrushEdit has been implemented and tested on CUDA118, Pytorch 2.0.1, python 3.10.6.

Clone the repo:

git clone https://github.com/TencentARC/BrushEdit.git

We recommend you first use conda to create virtual environment, and install pytorch following official instructions. For example:

conda create -n brushedit python=3.10.6 -y
conda activate brushedit
python -m pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

Then, you can install diffusers (implemented in this repo) with:

pip install -e .

After that, you can install required packages thourgh:

pip install -r app/requirements.txt

Download Checkpoints 💾

Checkpoints of BrushEdit can be downloaded using the following command.

sh app/down_load_brushedit.sh

The ckpt folder contains

  • BrushNetX pretrained checkpoints for Stable Diffusion v1.5 (brushnetX)
  • Pretrained Stable Diffusion v1.5 checkpoint (e.g., realisticVisionV60B1_v51VAE from Civitai). You can use scripts/convert_original_stable_diffusion_to_diffusers.py to process other models downloaded from Civitai.
  • Pretrained GroundingDINO checkpoint from offical.
  • Pretrained SAM checkpoint from offical.

The checkpoint structure should be like:

|-- models
    |-- base_model
        |-- realisticVisionV60B1_v51VAE
            |-- model_index.json
            |-- vae
            |-- ...
        |-- dreamshaper_8
            |-- ...
        |-- epicrealism_naturalSinRC1VAE
            |-- ...
        |-- meinamix_meinaV11
            |-- ...
        |-- ...
    |-- brushnetX
        |-- config.json
        |-- diffusion_pytorch_model.safetensors
    |-- grounding_dino
        |-- groundingdino_swint_ogc.pth
    |-- sam
        |-- sam_vit_h_4b8939.pth
    |-- vlm
        |-- llava-v1.6-mistral-7b-hf
          |-- ...
        |-- llava-v1.6-vicuna-13b-hf
          |-- ...
        |-- Qwen2-VL-7B-Instruct
          |-- ...
        |-- ...
      

We provide five base diffusion models, including:

  • Dreamshapre_8 is a versatile model that can generate impressive portraits and landscape images.
  • Epicrealism_naturalSinRC1VAE is a realistic style model that excels at generating portraits
  • HenmixReal_v5c is a model that specializes in generating realistic images of women.
  • Meinamix_meinaV11 is a model that excels at generating images in an animated style.
  • RealisticVisionV60B1_v51VAE is a highly generalized realistic style model.

The BrushNetX checkpoint represents an enhanced version of BrushNet, having been trained on a more diverse dataset to improve its editing capabilities, such as deletion and replacement.

We provide two VLM models, including Qwen2-VL-7B-Instruct and LLama3-LLaa-next-8b-hf. We strongly recommend using GPT-4o for reasoning. After selecting the VLM model as gpt4-o, enter the API KEY and click the Submit and Verify button. If the output is success, you can use gpt4-o normally. Secondarily, we recommend using the Qwen2VL model.

And you can download more prefromhuggingface_hubimporthf_hub_download, snapshot_downloadtrained VLMs model from QwenVL and LLaVA-Next.

🏃🏼 Running Scripts

🤗 BrushEidt demo

You can run the demo using the script:

sh app/run_app.sh 

👻 Demo Features

demo_vis

💡 Fundamental Features:

  • 🎨 Aspect Ratio: Select the aspect ratio of the image. To prevent OOM, 1024px is the maximum resolution.
  • 🎨 VLM Model: Select the VLM model. We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.
  • 🎨 Generate Mask: According to the input instructions, generate a mask for the area that may need to be edited.
  • 🎨 Square/Circle Mask: Based on the existing mask, generate masks for squares and circles. (The coarse-grained mask provides more editing imagination.)
  • 🎨 Invert Mask: Invert the mask to generate a new mask.
  • 🎨 Dilation/Erosion Mask: Expand or shrink the mask to include or exclude more areas.
  • 🎨 Move Mask: Move the mask to a new position.
  • 🎨 Generate Target Prompt: Generate a target prompt based on the input instructions.
  • 🎨 Target Prompt: Description for masking area, manual input or modification can be made when the content generated by VLM does not meet expectations.
  • 🎨 Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)
  • 🎨 Control length: The intensity of editing and inpainting.

💡 Advanced Features:

  • 🎨 Base Model: We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.
  • 🎨 Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)
  • 🎨 Control length: The intensity of editing and inpainting.
  • 🎨 Num samples: The number of samples to generate.
  • 🎨 Negative prompt: The negative prompt for the classifier-free guidance.
  • 🎨 Guidance scale: The guidance scale for the classifier-free guidance.

🤝🏼 Cite Us

@misc{li2024brushedit,
  title={BrushEdit: All-In-One Image Inpainting and Editing}, 
  author={Yaowei Li and Yuxuan Bian and Xuan Ju and Zhaoyang Zhang and and Junhao Zhuang and Ying Shan and Yuexian Zou and Qiang Xu},
  year={2024},
  eprint={2412.10316},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}


💖 Acknowledgement

Our code is modified based on diffusers and BrushNet here, thanks to all the contributors!

❓ Contact

For any question, feel free to email [email protected].

🌟 Star History

Star History Chart

About

The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing"

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages