-
Notifications
You must be signed in to change notification settings - Fork 355
How to convert Stable Diffusion models to Core ML
Mochi Diffusion works with MLMODELC files, which are native to Apple's Core ML. To obtain an MLMODELC file, you need to first convert the original Stable Diffusion model (CKPT or SafeTensors) to Diffusers, and then convert the Diffusers to MLMODELC.
-
Install Homebrew and remember to follow the instructions under "Next steps"
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
-
Install Wget
brew install wget
-
Download and install Xcode
-
Select Xcode as the active Command Line Tools provider.
There are two ways to achieve this.
-
In Terminal, run the following command:
sudo xcode-select -s /Applications/Xcode.app
-
Or open Xcode, go to the Xcode menu / Settings... / Locations and select your Xcode version in the "Command Line Tools" picker.
-
-
Download and Install Miniconda
-
Once done, run the commands below according to their display order
git clone https://github.com/apple/ml-stable-diffusion.git
conda create -n coreml_stable_diffusion python=3.8 -y
conda activate coreml_stable_diffusion
cd ml-stable-diffusion
pip install -e .
pip install omegaconf
pip install safetensors
-
Download this Python script and place it in the same folder as the model
This process takes ~1min to complete.
-
Activate the Conda environment
conda activate coreml_stable_diffusion
-
Navigate to the folder where the script is located via
cd /<YOUR-PATH>
(you can also typecd
and then drag the folder into the Terminal app) -
Now you have two options:
-
If your model is in CKPT format, run
python convert_original_stable_diffusion_to_diffusers.py --checkpoint_path <MODEL-NAME>.ckpt --device cpu --extract_ema --dump_path <MODEL-NAME>_diffusers
-
If your model is in SafeTensors format, run
python convert_original_stable_diffusion_to_diffusers.py --checkpoint_path <MODEL-NAME>.safetensors --from_safetensors --device cpu --extract_ema --dump_path <MODEL-NAME>_diffusers
-
- When exclusively converting SDXL 1.0 models, be sure to include the following flag:
--pipeline_class_name StableDiffusionXLPipeline
- Starting with diffusers 0.29.0, there is a default
max_shard_size
of 10GB. If your model is large (SDXL, Pony, etc), the unet files will exceed this limit and it will split them. The next conversion step isn't able to handle this and it will error out. To get around this you can either...- ... use diffusers 0.28.2 (
pip install diffusers==0.28.2
) if you don't need the functions/features of the newer versions. - ... add the
--half
flag to the commands above if you can accept the loss in precision. - ... or edit line 188 of the conversion script to increase the
max_shard_size
.- ORIGINAL:
pipe.save_pretrained(args.dump_path, safe_serialization=args.to_safetensors)
- UPDATED:
pipe.save_pretrained(args.dump_path, safe_serialization=args.to_safetensors, max_shard_size="15GB")
- (15GB is just an example, use whatever size is appropriate for your task and hardware capabilities)
- ORIGINAL:
- ... use diffusers 0.28.2 (
This process takes ~25 minutes to complete.
Each conversion script actually runs twice to make 2 different types of one particular component. This enables the converted models to work with and without the ControlNet feature.
If you're doing this right after the previous step, ignore points 1 and 2.
-
Activate the Conda environment
conda activate coreml_stable_diffusion
-
Navigate to the folder where the script is located via
cd /<YOUR-PATH>
(you can also typecd
and then drag the folder into the Terminal app) -
Now you have two options:
-
SPLIT_EINSUM
, which is compatible with all compute unitspython -m python_coreml_stable_diffusion.torch2coreml --convert-vae-decoder --convert-vae-encoder --convert-unet --unet-support-controlnet --convert-text-encoder --model-version <MODEL-NAME>_diffusers --bundle-resources-for-swift-cli --attention-implementation SPLIT_EINSUM -o <MODEL-NAME>_split-einsum && python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --model-version <MODEL-NAME>_diffusers --bundle-resources-for-swift-cli --attention-implementation SPLIT_EINSUM -o <MODEL-NAME>_split-einsum
-
ORIGINAL
, which is only compatible withCPU & GPU
python -m python_coreml_stable_diffusion.torch2coreml --compute-unit CPU_AND_GPU --convert-vae-decoder --convert-vae-encoder --convert-unet --unet-support-controlnet --convert-text-encoder --model-version <MODEL-NAME>_diffusers --bundle-resources-for-swift-cli --attention-implementation ORIGINAL -o <MODEL-NAME>_original && python -m python_coreml_stable_diffusion.torch2coreml --compute-unit CPU_AND_GPU --convert-unet --model-version <MODEL-NAME>_diffusers --bundle-resources-for-swift-cli --attention-implementation ORIGINAL -o <MODEL-NAME>_original
-
Only when using the
ORIGINAL
implementation, it's possible to modify the output image size by adding the--latent-w <SIZE>
and--latent-h <SIZE>
flags. For example:python -m python_coreml_stable_diffusion.torch2coreml --latent-w 64 --latent-h 96 --compute-unit CPU_AND_GPU --convert-vae-decoder --convert-vae-encoder --convert-unet --unet-support-controlnet --convert-text-encoder --model-version <MODEL-NAME>_diffusers --bundle-resources-for-swift-cli --attention-implementation ORIGINAL -o <MODEL-NAME>_original_512x768 && python -m python_coreml_stable_diffusion.torch2coreml --latent-w 64 --latent-h 96 --compute-unit CPU_AND_GPU --convert-unet --model-version <MODEL-NAME>_diffusers --bundle-resources-for-swift-cli --attention-implementation ORIGINAL -o <MODEL-NAME>_original_512x768
The chosen image size must be divisible by
64
. Also, you have to specify it divided by8
(e.g.768/8=96
).
In the example above, the model will always output images at a resolution of 512x768
-
-
-
The needed files will be created under the
<MODEL-NAME>/Resources
folder. Everything else can be discarded
- When exclusively converting SDXL 1.0 models, be sure to include the following flag:
--xl-version
- As of today,
ORIGINAL
implementations with output sizes greater than 512x768 or 768x512, work slowly on lower-performance machines or do not work at all. 768x768 models had been tested with a time of ~1min/step with M1 (and some kernel panics), ~1s/step with M1 Max 32 GPU, and 1024x1024 models just can't be run (MPSNDArray error: product of dimension sizes > 2**31).- The issue with 1024x1024 models is reported as resolved with the release of an API update with MacOS 15.0: (MPSNDArray error: product of dimension sizes > 2**31)
-
This package is incompatible with this version of macOS
: after the "Software Licence Agreement" step, click on "Change Install Location..." and select "Install for me only"
-
xcrun: error: unable to find utility "coremlcompiler", not a developer tool or in PATH
: open Xcode and go to "Settings..." → "Locations" then click on the "Command Line Tools" drop-down menu and reselect the Command Line Tools version -
ModuleNotFoundError: No module named 'pytorch_lightning'
: while the condacoreml_stable_diffusion
environment is active, runpip install pytorch_lightning
Every time you see a similar message, you can solve it by installing what is requested via
pip install <NAME>
-
zsh: killed python
: your Mac has run out of memory. Close some memory-hungry applications you may have open and do the process again. Still not working? Reboot. Still not working? Usenice -n 10
before the command. Still not working? Well,SPLIT_EINSUM
conversions tend to be the more demanding, so while converting, close all the other apps and leave your Mac melting alone
-
If you get any of these
TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
WARNING:__main__:Casted the `beta`(value=0.0) argument of `baddbmm` op from int32 to float32 dtype for conversion!
WARNING:coremltools:Tuple detected at graph output. This will be flattened in the converted model.
WARNING:coremltools:Saving value type of int64 into a builtin type of int32, might lose precision!
You're fine
- SD to Core ML by Zabriskije
- CoreML model conversion script(s) by MDMAchine
- Core ML VAEs (also on Hugging Face) by Zabriskije