-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Add Facebook's Token Merging feature for faster inference time #4364
Comments
From the results by others and me it seems to speedup inference by ~20-25% at 512x512 which is quite significant, and allows for generation of much bigger images. |
I did an extremely quick and dirty patch of webui and stable-diffusion repo with code from https://github.com/Birch-san/stable-diffusion to check ToMe. I also hard-coded doggettx's attention into the code because I'm not good enough to figure out how to add this properly - so for anyone using xformers, xformers will be faster for you than this patch. https://gist.github.com/Yardanico/081e7e23ea1d51dd70f1a75a6df8b876 if you want to try. I'm getting 25% speed increase on my RX6700XT from 6it/s to 7.5it/s, and I can also generate bigger resolutions while it being faster. There is some accuracy loss though, but it largely depends on the prompt. |
very interesting work, hope we can enjoy this feature soon (with xformers if possible). And note that ToMe is drafting a stable diffusion suport (with examples and code "coming soon"), also ref: facebookresearch/ToMe#4 |
Update: there is already a ToMe implementation for Stable Diffusion: https://github.com/dbolya/tomesd import tomesd
# Patch a Stable Diffusion model with ToMe for SD using a 50% merging ratio.
# Using the default options are recommended for the highest quality, tune ratio to suit your needs.
tomesd.apply_patch(model, ratio=0.5)
# However, if you want to tinker around with the settings, we expose several options.
# See docstring and paper for details. Note: you can patch the same model multiple times.
tomesd.apply_patch(model, ratio=0.9, sx=4, sy=4, max_downsample=2) # Extreme merging, expect diminishing returns Update: There is already a PR working on this, see below Update again: I implement a extension to use ToMe (https://github.com/SLAPaper/a1111-sd-webui-tome), but it seems only gives a ~13% speed up when using batch size 8 |
Working on this in #9256 |
Is there an existing issue for this?
What would your feature do ?
Implement https://github.com/facebookresearch/ToMe which allows for faster image inference time
Proposed workflow
Maybe a CLI option? From what I read it decreases accuracy by a bit, so some people won't want to have it enabled.
Additional information
See code in facebookresearch/ToMe#7 and https://github.com/Birch-san/stable-diffusion/
The text was updated successfully, but these errors were encountered: