- Performance of Community Edition
- Installation Guide
- Extensions Usage
- Quantization
- Use OneDiff by API
- Contact
Updated on JAN 13, 2024. Device: RTX 3090. Resolution: 1024x1024
torch(Baseline) | TensorRT-v9.0.1 | onediff(Optimized) | Percentage improvement | |
---|---|---|---|---|
SDXL w/o LoRA | 2.99it/s | 6.40it/s | 7.08it/s | 236.79% |
SDXL w/ LoRA | 2.95it/s | N/A | 7.08it/s | 240.00% |
End2end time(seconds) to generate a 1024x1024 image with SDXL (30 steps) on NVIDIA RTX 3090:
torch(Baseline) | TensorRT-v9.0.1 | onediff(Optimized) | Percentage improvement | |
---|---|---|---|---|
SDXL w/o LoRA | 11.03 | 5.55 | 5.16 | 213.76% |
SDXL w/ LoRA | 11.17 | N/A | 5.20 | 214.81% |
Note: The performence with LoRA doesn't include the time of LoRA fusing. And the "slowing down" of the model inference speed with LoRA falls within the normal fluctuation range.
It is recommended to create a Python virtual environment in advance. For example conda create -n sd-webui python=3.10
.
Run the commands below to install Sable Diffusion WebUI and OneDiff extensions.
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
git clone https://github.com/siliconflow/onediff.git
ln -s "$(pwd)/onediff/onediff_sd_webui_extensions" "$(pwd)/stable-diffusion-webui/extensions/onediff_sd_webui_extensions"
cd stable-diffusion-webui
# Install all of stable-diffusion-webui's dependencies.
# If you install as root user, append `-f` to the end of the command line.
venv_dir=- bash webui.sh
cd stable-diffusion-webui
python webui.py --port 7860
Accessing http://server:7860/ from a web browser.
To activate OneDiff extension acceleration, follow these steps:
Select onediff_diffusion_model
from the Script menu, enter a prompt in the text box (e.g., "a black dog"), and then click the "Generate" button.
When switching models, if the new model has the same structure as the old model, OneDiff will reuse the previously compiled graph, which means you don't need to compile the new model again, which significantly reduces the time it takes you to switch models.
Note: The feature is not supported for quantized model.
OneDiff supports saving compiler cache to disk and loading cache from disk. In scenarios where recompiling is required after switching model, you can skip the compilation process by loading the compiler cache from the disk, to saving time of model switching.
The compiler cache will be saved at /path/to/your/stable-diffusion-webui/extensions/onediff_sd_webui_extensions/compiler_caches/
by default. If you want to specify the path, you can modify it in webui settings.
After selecting onediff, a text box named Saved cache name
will appear at the bottom right. You can input the file name of the compiler cache you want to save here. After generating the image, the compiler cache will be saved in the stable-diffusion-webui/extensions/onediff_sd_webui_extensions/compiler_caches/your-compiler-cache-name
path.
Note: When the text box is empty or the file with the specified name already exists, the compiler cache will not be saved.
After selecting onediff, a dropdown menu named Compile cache
will appear at the bottom left. Here, you can select the compiler cache you want to load. This dropdown menu will display all files located in the path stable-diffusion-webui/extensions/onediff_sd_webui_extensions/compiler_caches/
. And click the button on the right side to refresh the Compile cache
list.
Note: To properly use this feature, please ensure that you have added the
--disable-safe-unpickle
parameter when launching sd-webui.
OneDiff supports the complete functionality related to LoRA. You can use OneDiff-based LoRA just like the native LoRA in sd-webui.
FAQ:
-
After switching LoRA, should I recompile the model?
OneDiff supports dynamically switching LoRA without recompiling the model, because the model with LoRA and the one without LoRA share the same parameter pointer, which have already been captured by the static graph.
-
What's the time cost of LoRA fusing?
The initial few times of LoRA fusing may take a bit of time (1~2s), but when stabilized, the time cost is ~700ms.
-
Will LoRA fusing affect the inference efficiency of the model?
No, the model's inference efficiency remains the same after fusing LoRA as it was before fusing LoRA.
Note: Quantization feature is only supported by OneDiff Enterprise.
OneDiff Enterprise offers a quantization method that reduces memory usage, increases speed, and maintains quality without any loss.
You can use the quantization feature by simply checking the Model Quantization (int8) Speed Up option which enables better inference performence as the image below.
If you possess a OneDiff Enterprise license key, you can access instructions on OneDiff quantization and related models by visiting Hugginface/siliconflow. Alternatively, you can contact us to inquire about purchasing the OneDiff Enterprise license.
Note: The quantified model doesn't currently support LoRA loading, and related features are under emergency development.
Run the commands below to use Sable Diffusion WebUI with OneDiff extensions (Assume you are in the stable-diffusion-webui/
dir).
# start sd-webui client
python3 webui.py --api
# send request
python3 extensions/onediff_sd_webui_extensions/api_examples/txt2img.py
python3 extensions/onediff_sd_webui_extensions/api_examples/img2img.py
Then you can get the images returned by sd-webui client at ./api_out/txt2img/
and ./api_out/img2img/
.
To use the OneDiff-based sd-webui API, you only need to add a "script": "onediff_diffusion_model"
field to the request to speed up model inference.
For OneDiff Enterprise, add "script_args" : [{"0": True}]
to use the quantization feature.
Check file txt2img.py and img2img.py for more details.
For users of OneDiff Community, please visit GitHub Issues for bug reports and feature requests.
For users of OneDiff Enterprise, you can contact [email protected] for commercial support.
Feel free to join our Discord community for discussions and to receive the latest updates.