Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding Alternative Architectures for More Complicated Workflows #14

Open
xiankgx opened this issue Mar 6, 2024 · 0 comments
Open

Comments

@xiankgx
Copy link

xiankgx commented Mar 6, 2024

We also introduce several alternative architectures in Fig. 4 for more complicated workflows. We
can add zero-initialized channels to the UNet and use VAE (with or without latent transparency) to
encode foreground, or background, or layer combinations into conditions, and train the model to generate foreground or background (e.g., Fig. 4-(b, d)), or directly generate blended images (e.g.,
Fig. 4-(a, c)).

  1. The base model is a SDXL with LoRA layers. What are these alternative architectures? Is it simply the base model (SDXL with LoRA), then extend the input convolution of the Unet to include more channels?
  2. What are the model weights format in? Is it values difference compared to the base model?
  3. The input to the UNet now is noised latents + additional conditional image latents . What is the order of the latents in the concat list? Are the additional latents noised or unnoised?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant