You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We also introduce several alternative architectures in Fig. 4 for more complicated workflows. We
can add zero-initialized channels to the UNet and use VAE (with or without latent transparency) to
encode foreground, or background, or layer combinations into conditions, and train the model to generate foreground or background (e.g., Fig. 4-(b, d)), or directly generate blended images (e.g.,
Fig. 4-(a, c)).
The base model is a SDXL with LoRA layers. What are these alternative architectures? Is it simply the base model (SDXL with LoRA), then extend the input convolution of the Unet to include more channels?
What are the model weights format in? Is it values difference compared to the base model?
The input to the UNet now is noised latents + additional conditional image latents . What is the order of the latents in the concat list? Are the additional latents noised or unnoised?
The text was updated successfully, but these errors were encountered:
The text was updated successfully, but these errors were encountered: