Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worse results? #7

Open
catboxanon opened this issue Mar 17, 2024 · 3 comments
Open

Worse results? #7

catboxanon opened this issue Mar 17, 2024 · 3 comments

Comments

@catboxanon
Copy link

Hi, I'm a maintainer of the Stable Diffusion webui. I tried implementing the composite method outlined in the paper and this repo, but it seems to produce worse results in all cases I've tested. See AUTOMATIC1111/stable-diffusion-webui#15037 (comment) for reference, which includes sample images and the code I used.

This also seems to introduce some major performance issues, as mentioned in #6, but perhaps that's unavoidable?

@maszhongming
Copy link
Owner

Hi, I truly appreciate your input and the work you've put into integrating our methods. I'm not familiar with the Stable Diffusion webui codebase myself, but from what you've shared, the combine_denoised function appears to be implemented correctly.

Regarding the shared samples, do you think LoRA Merge is better because of the style? From my perspective, I think LoRA Composite's characters and poses align more closely with the image in Concept, but the style isn't significant enough; LoRA Merge has a better style, but there's a weird vertical bar on the right side of the curtain.

We have observed in both automatic and human evaluations that LoRA Merge shows superior performance when combining two LoRAs, especially when one of them is a "style" (the generated image shows distinct "style" features). However, when combining two other types of LoRAs (e.g., character + clothing), or when the number of LoRAs to be combined increases (3-5 LoRAs), it is significantly worse than our approach. Would you mind testing these scenarios and sharing your findings? I'm also willing to conduct tests with the same examples under the diffusers and peft codebases to pinpoint potential issues.

As for the inference speed, it's true that LoRA Composite demands more processing time. This increase should be linear, meaning combining k LoRAs would take roughly k times longer. Although I haven't come up with any methods to optimize it yet, the increase shouldn't be as drastic as you've mentioned, from a few seconds to over a minute.

By the way, have you had a chance to try the LoRA Switch method? Its inference time is on par with LoRA Merge, but our evaluations suggest it surpasses both LoRA Composite and Merge in terms of composition quality (Section 3.2 in our paper, observation 2).

@catboxanon
Copy link
Author

catboxanon commented Mar 18, 2024

Thank you for the in-depth reply!

I will try to take more time in conducting various tests with other LoRAs besides ones intended to influence style (e.g. character + clothing, as you mentioned). The degraded inference speed may be caused by some webui internals but I may look into that as well.

As for the Switch method -- this actually will be simpler to test, as an extension exists that would facilitate invoking this quite easily. I may even make a PR to extend the syntax with something that would make invoking the Switch method more simple. https://github.com/cheald/sd-webui-loractl

@maszhongming
Copy link
Owner

Thank you for your efforts and for exploring further tests!

Additionally, I strongly recommend testing with a combination of 3-5 LoRAs, as highlighted in our paper. With more LoRAs to be combined, vanilla LoRA merge will destabilize the generation process.

Regarding the degraded inference speed you've observed, while I'm not familiar with the webui's internals, a comparison with the implementation of diffusers in managing active adapters might shed some light on potential optimizations or differences.

Glad to hear the Switch method is simpler to test in your setup. Your work in extending the syntax for easier invocation is greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants