Yet another unofficial Diffuser support #32

rootonchair · 2024-05-25T17:16:06Z

Github: https://github.com/rootonchair/diffuser_layerdiffuse

This project is a port to Diffusers, it allows you to run transparent image with SD1.5 (transparent only or joint generation) and SDXL (Attn and Conv Injection) with Diffusers friendly API

Don't hesitate to give it a try

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import torch

from diffusers import StableDiffusionPipeline

from models import TransparentVAEDecoder
from loaders import load_lora_to_unet

model_path = hf_hub_download(
        'LayerDiffusion/layerdiffusion-v1',
        'layer_sd15_vae_transparent_decoder.safetensors',
)
 
vae_transparent_decoder = TransparentVAEDecoder.from_pretrained("digiplay/Juggernaut_final", subfolder="vae", torch_dtype=torch.float16).to("cuda")
vae_transparent_decoder.set_transparent_decoder(load_file(model_path))

pipeline = StableDiffusionPipeline.from_pretrained("digiplay/Juggernaut_final", vae=vae_transparent_decoder, torch_dtype=torch.float16, safety_checker=None).to("cuda")

model_path = hf_hub_download(
        'LayerDiffusion/layerdiffusion-v1',
        'layer_sd15_transparent_attn.safetensors'
)

load_lora_to_unet(pipeline.unet, model_path, frames=1)
    
image = pipeline(prompt="a dog sitting in room, high quality", 
                       width=512, height=512,
                       num_images_per_prompt=1, return_dict=False)[0]

WyattAutomation · 2024-09-15T19:54:29Z

You are an absolute champ for doing this -- I am soooo hoping this works without a hitch with OneDiff so I can compile it for realtime.

I have ControlNet running in realtime and integrated with Unity 3D via NDI to AI-generate the entire game world in real time with just prompts, a WASD controlled 3rd person OpenPose skeleton, and a stream of the depth image of randomly placed cubes.

I plan to migrate my app into a microservices architecture and have the two separate NDI streams (that will be migrated to webrtc to make them usable over WAN) coming out of Unity, streamed into two completely separate StableDiffusionImg2ImgControlnet pipelines that only do one Controlnet each for an assigned layer, then they alpha blend the layers back together for the output.

I believe this has the potential to produce absolutely groundbreaking results -- I am posting this right before I begin work on it, but if your port here works there's a chance I am going to report back with a working demo of a viable framework for AI-generating an entire videogame frame-by-frame in realtime, as it is played. I will be setting up a multimodal LLM Agent (likely Pixtral) with a sandbox inside the runtime of the game for function calling to spawn enemies and objects using just the existing pose skeletons, but the last step is getting LayerDiffuse applied so that I can focus specialized pipelines onto separate render layers in the game.

The only thing I don't know if it will work -- I have to do

...
self.pipeline.vae.decoder = oneflow_compile(self.pipeline.vae.decoder) 
...

in my existing pipeline code to include the VAE in the pipeline compilation (to achieve the frame rate and responsiveness needed for the game controls to make the output an actually playable game). If it works we are golden -- if not I'll post an issue on OneDiff, and report back here to reassess.

Fingers crossed it just works.

vlc-record-2024-09-15-15h52m16s-2024-09-12.23-14-47.mkv-.mp4

zhanpengxin · 2024-09-15T19:55:14Z

你好，我已收到你的邮件，我会及时处理^_^

rootonchair · 2024-09-16T03:50:52Z

@WyattAutomation I am happy to hear that it works with Oneflow flawlessly

WyattAutomation · 2024-09-16T05:31:31Z

Actually I was unfortunately not successful -- to clarify that video I posted is just OneDiff/Oneflow without layer diffusion, I want to use layer diffusion in the app I am developing here.

My plan was to try to achieve much better quality by using multiple seperate pipelines running in seperate threads or containers, each with checkpoints, ControlNets and LoRAs that are specialized for generating only specific, dedicated parts or features of the frames as they render in realtime.

In order to do this, I need to leverage having transparent backgrounds in the output of each layer for compositing the images into output frames. I could use YOLO and segmasking in a sort of realtime video generator version of what ADetailer does. But that's going to take time to do and isn't as ideal as having the diffusion model already have that step taken care of (and at higher quality).

The error that OneDiff gave me when using the OneFlow backend I think is related to the classes in attention_parameters.py. I don't know if it's because they are inheriting nn.Module and don't have an instance of forward() declared or what it could be, but I upgraded everything to the latest version, I tried adding in a stub forward method to those classes , among several other things and couldn't resolve it.

The pipeline instantiates just fine, it's when you try to inference the pipeline that this error occurs:

...python3.10/site-packages/onediff/infer_compiler/backends/oneflow/transform/builtin_transform.py:221 - convert <class 'list'> failed: Transform failed of <class 'list'>: Transform failed of <class 'diffusers.models.attention_processor.Attention'>: Transform failed of <class 'layer_diffuse.models.attention_processors.AttentionSharingProcessor2_0'>: Unsupported type: <class 'list'>

There were a bunch of other lines with similar errors, all pointing at “Unsupported type” and some of them referenced “attention_processor” in my installation of diffusers and some of them referenced “attention_processors” from your diffusers port.

I tested your repo without OneDiff/OneFlow and it works fine. It only throws the error when it tries to generate an image after using compile_pipe or infer_compiler.

In fact, it even worked when I compiled only the VAE and did not compile the UNet – compiling the UNet is specifically what triggers it to fail.

I read a similar issue on the OneDiff repo on “Unsupported type” that someone else had with oneflow, but all they said was something along the lines of “I figured it out, a submodule of torch.nn.Module has to have a declaration of forward() in the class” then they closed the issue.

I added stubs that declared the forward() method to all your classes in attention_processors.py but it remained unchanged (same error).

I did manage to get images generating using the nexfort backend for OneDiff, however all the images are completely blank/transparent, and I get a Cuda warning about how the “graph is empty”.

I will try again tomorrow, but if there is any idea on how to get your attention_processors agreeing with OneDiff/Oneflow, let me know.

Thank you for your work on this; it remains the best option out there for what I am trying to do, if I can just get it working with OneDiff.

rootonchair mentioned this issue Sep 16, 2024

Onediff support rootonchair/diffuser_layerdiffuse#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yet another unofficial Diffuser support #32

Yet another unofficial Diffuser support #32

rootonchair commented May 25, 2024

WyattAutomation commented Sep 15, 2024

zhanpengxin commented Sep 15, 2024 via email

rootonchair commented Sep 16, 2024

WyattAutomation commented Sep 16, 2024 •

edited

Loading

Yet another unofficial Diffuser support #32

Yet another unofficial Diffuser support #32

Comments

rootonchair commented May 25, 2024

WyattAutomation commented Sep 15, 2024

zhanpengxin commented Sep 15, 2024 via email

rootonchair commented Sep 16, 2024

WyattAutomation commented Sep 16, 2024 • edited Loading

WyattAutomation commented Sep 16, 2024 •

edited

Loading