Skip to content

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

License

Notifications You must be signed in to change notification settings

zydxt/sd-webui-rpg-diffusionmaster

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Note:

This extension is Not Being Actively Developed due to a shift in my personal focus and interests. Besides, there was no feature change in the original RPG-DiffusionMaster project recently.

RPG-DiffusionMaster Extension for Stable Diffusion WebUI

This repository hosts an extension for Stable Diffusion WebUI that integrates the functionalities of RPG-DiffusionMaster. It brings additional changes and enhancements, enabling users of WebUI to interact with RPG-DiffusionMaster more seamlessly.

For more information, check the official repo or the following paper:

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs Authors: Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui Affiliations: Peking University, Stanford University, Pika Labs

Introduction

Currently in an early phase of development, this extension employs LLMs (such as GPT4, Gemini Pro) for regional planning. It communicates the split ratios and regional prompts generated from LLMs to Regional Prompter for image generation, similar to the official repository.

Installation

Prior to installing this extension, ensure that the Regional Prompter extension is already set up on your system. This extension has not yet been added to the WebUI extensions index, and hence must be installed manually using the URL on the WebUI extension tab. installation

Usage

  1. Navigate to the txt2img tab.
  2. Choose RPG DiffusionMaster from the Script dropdown menu. dropdown_
  3. Select your desired LLM and configure the settings for RPG-DiffusionMaster. config_
  4. Press the "Apply to Prompt" button and wait briefly as the extension processes the prompt through the LLM and adjusts the Regional Prompter configurations accordingly.
  5. Review the adjusted settings and the final prompt in the Prompt textbox. You can then modify parameters like image size, CFG Scale, Steps, etc., before generating your images.

To-Do List 💪

  • Integrate local LLM support.

Differences from the Official Implementation

  • Adds support for the OpenAI Azure GPT4 Model and Gemini Pro.
  • Alters the logic to enhance stability when extracting regional prompts.

Acknowledgements

A huge thank you to Ling Yang for the foundational RPG-DiffusionMaster implementation, AUTOMATIC1111, and regional-prompter for their exceptional contributions and codebases.

About

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%