Regarding Alternative Architectures for More Complicated Workflows #14

xiankgx · 2024-03-06T08:28:08Z

We also introduce several alternative architectures in Fig. 4 for more complicated workflows. We
can add zero-initialized channels to the UNet and use VAE (with or without latent transparency) to
encode foreground, or background, or layer combinations into conditions, and train the model to generate foreground or background (e.g., Fig. 4-(b, d)), or directly generate blended images (e.g.,
Fig. 4-(a, c)).

The base model is a SDXL with LoRA layers. What are these alternative architectures? Is it simply the base model (SDXL with LoRA), then extend the input convolution of the Unet to include more channels?
What are the model weights format in? Is it values difference compared to the base model?
The input to the UNet now is noised latents + additional conditional image latents . What is the order of the latents in the concat list? Are the additional latents noised or unnoised?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding Alternative Architectures for More Complicated Workflows #14

Regarding Alternative Architectures for More Complicated Workflows #14

xiankgx commented Mar 6, 2024 •

edited

Loading

Regarding Alternative Architectures for More Complicated Workflows #14

Regarding Alternative Architectures for More Complicated Workflows #14

Comments

xiankgx commented Mar 6, 2024 • edited Loading

xiankgx commented Mar 6, 2024 •

edited

Loading