Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error load model SDXL #552

Open
kashimAstro opened this issue Jan 6, 2025 · 2 comments
Open

error load model SDXL #552

kashimAstro opened this issue Jan 6, 2025 · 2 comments

Comments

@kashimAstro
Copy link

kashimAstro commented Jan 6, 2025

Hi guys, thanks for the great work.

I recently downloaded the latest master from git
to try out the new features like inpaint.

I noticed that when I compile with CUDA and VULKAN and try to load SDXL models I get a segmentation fault.

I think it all happens near model.cpp
when i compile with cpu backend this error does not appear.

attach verbose cuda backend:

./build.cuda/bin/sd -m /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors --vae /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors -p 'a lovely cat' --vae-on-cpu -v

Option:
n_threads: 6
mode: txt2img
model_path: /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path: /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
mask_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:true
diffusion flash attention:false
strength(control): 0.90
prompt: a lovely cat
negative_prompt:
min_cfg: 1.00
cfg_scale: 7.00
slg_scale: 0.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: euler_a
schedule: default
sample_steps: 20
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:163 - Using CUDA backend
[INFO ] stable-diffusion.cpp:195 - loading model from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:959 - init from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:230 - loading vae from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors using safetensors format
[DEBUG] model.cpp:959 - init from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors'
[INFO ] stable-diffusion.cpp:242 - Version: SDXL
[INFO ] stable-diffusion.cpp:275 - Weight type: f16
[INFO ] stable-diffusion.cpp:276 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:277 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:278 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:280 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size = 469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size = 2649.92 MB(VRAM) (517 tensors)
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4900.07 MiB on device 0: cudaMalloc failed: out of memory
[ERROR] ggml_extend.hpp:1101 - unet alloc params backend buffer failed, num_tensors = 1680
[INFO ] stable-diffusion.cpp:354 - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1107 - vae params backend buffer size = 94.47 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:417 - loading weights
[DEBUG] model.cpp:1698 - loading tensors from /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors
|=============> | 713/2641 - 11.36it/s
Errore di segmentazione (core dump creato)

attach verbose vulkan backend:

./build.vulkan/bin/sd -m /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors --vae /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors -p 'a lovely cat' --vae-on-cpu -v
Option:
n_threads: 6
mode: txt2img
model_path: /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path: /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
mask_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:true
diffusion flash attention:false
strength(control): 0.90
prompt: a lovely cat
negative_prompt:
min_cfg: 1.00
cfg_scale: 7.00
slg_scale: 0.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: euler_a
schedule: default
sample_steps: 20
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:172 - Using Vulkan backend
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce GTX 1070 Ti (NVIDIA) | uma: 0 | fp16: 0 | warp size: 32
ggml_vulkan: Compiling shaders..............................Done!
[INFO ] stable-diffusion.cpp:195 - loading model from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:959 - init from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:230 - loading vae from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors using safetensors format
[DEBUG] model.cpp:959 - init from '/media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sdxl_vae.safetensors'
[INFO ] stable-diffusion.cpp:242 - Version: SDXL
[INFO ] stable-diffusion.cpp:275 - Weight type: f16
[INFO ] stable-diffusion.cpp:276 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:277 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:278 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:280 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size = 469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1107 - clip params backend buffer size = 2649.92 MB(VRAM) (517 tensors)
ggml_vulkan: Device memory allocation of size 847096320 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
[ERROR] ggml_extend.hpp:1101 - unet alloc params backend buffer failed, num_tensors = 1680
[INFO ] stable-diffusion.cpp:354 - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1107 - vae params backend buffer size = 94.47 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:417 - loading weights
[DEBUG] model.cpp:1698 - loading tensors from /media/dati003/MODEL_DIFFUSION/MODEL_DIFFUSION_SD3/sd_xl_base_1.0.safetensors
|=============> | 713/2641 - 71.43it/s
Errore di segmentazione (core dump creato)

I run sd.cpp with gdb to try to trace the error, but I'm not sure if that's the right place.

gdb cuda:

Thread 1 "sd" received signal SIGSEGV, Segmentation fault.
0x0000555555940af2 in ggml_fp16_to_fp32_row ()
(gdb) where
#0 0x0000555555940af2 in ggml_fp16_to_fp32_row ()
#1 0x00005555555eb689 in ModelLoader::load_tensors(std::function<bool (TensorStorage const&, ggml_tensor**)>, ggml_backend*) ()
#2 0x00005555555ec8b9 in ModelLoader::load_tensors(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, ggml_tensor*, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, ggml_tensor*> > >&, ggml_backend*, std::set<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >)
()
#3 0x00005555556bf935 in StableDiffusionGGML::load_from_file(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool, ggml_type, schedule_t, bool, bool, bool, bool) ()
#4 0x0000555555627ccc in new_sd_ctx ()
#5 0x00005555555736bc in main ()

gdb vulkan:

Thread 1 "sd" received signal SIGSEGV, Segmentation fault.
0x00005555557a81d5 in ggml_backend_tensor_set ()
(gdb) where
#0 0x00005555557a81d5 in ggml_backend_tensor_set ()
#1 0x00005555555f4593 in ModelLoader::load_tensors(std::function<bool (TensorStorage const&, ggml_tensor**)>, ggml_backend*) ()
#2 0x00005555555f4d59 in ModelLoader::load_tensors(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, ggml_tensor*, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, ggml_tensor*> > >&, ggml_backend*, std::set<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >)
()
#3 0x00005555556c7d35 in StableDiffusionGGML::load_from_file(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool, ggml_type, schedule_t, bool, bool, bool, bool) ()
#4 0x000055555563015c in new_sd_ctx ()
#5 0x00005555555991cc in main ()

in the cuda case if I follow gdb suggestion
it seems that the point is in convert_tensor model.cpp:

https://github.com/leejet/stable-diffusion.cpp/blob/master/model.cpp#L735

while for vulkan I arrive here:
https://github.com/leejet/stable-diffusion.cpp/blob/master/model.cpp#L1822

I'm not sure how to debug this right now.
the verbose makes me think it was a loading error.

cuda:
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4900.07 MiB on device 0: cudaMalloc failed: out of memory

vulkan:
gml_vulkan: Device memory allocation of size 847096320 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory

any ideas to try to investigate?

thanks Dario

@stduhpf
Copy link
Contributor

stduhpf commented Jan 6, 2025

Can you try running it with q8_0 quantization to see how it goes? (add --type q8_0 to your command)
SDXL model in fp16 is almost 7GB, and depending on what's running in the background on your computer it might not fit in the 8GB vram of your 1070 Ti. In theory, it should "swap" to system memory when running out of vram, but maybe that's not an option on your machine (depending on OS/drivers).

@kashimAstro
Copy link
Author

Hi and thanks, actually with q8_0 quantization it works, but I forgot to specify that with some previous version of sd.cpp without quantization and only with --vae-on-cpu I was able to load the sdxl models.

an example with the old version:

/build.cuda/bin/sd -m /media/dati003/MODEL_DIFFUSION/sdxlUnstableDiffusers_v11.safetensors --vae /media/dati003/MODEL_DIFFUSION/sdxl_vae.safetensors -p 'a lovely cat' -v
Option:
n_threads: 6
mode: txt2img
model_path: /media/dati003/MODEL_DIFFUSION/sdxlUnstableDiffusers_v11.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path: /media/dati003/MODEL_DIFFUSION/vae.fix.safetensors
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:false
diffusion flash attention:false
strength(control): 0.90
prompt: a lovely cat
negative_prompt:
min_cfg: 1.00
cfg_scale: 7.00
slg_scale: 0.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: euler_a
schedule: default
sample_steps: 20
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:163 - Using CUDA backend
[INFO ] stable-diffusion.cpp:195 - loading model from '/media/dati003/MODEL_DIFFUSION/sdxlUnstableDiffusers_v11.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/sdxlUnstableDiffusers_v11.safetensors using safetensors format
[DEBUG] model.cpp:958 - init from '/media/dati003/MODEL_DIFFUSION/sdxlUnstableDiffusers_v11.safetensors'
[INFO ] stable-diffusion.cpp:230 - loading vae from '/media/dati003/MODEL_DIFFUSION/vae.fix.safetensors'
[INFO ] model.cpp:888 - load /media/dati003/MODEL_DIFFUSION/vae.fix.safetensors using safetensors format
[DEBUG] model.cpp:958 - init from '/media/dati003/MODEL_DIFFUSION/vae.fix.safetensors'
[INFO ] stable-diffusion.cpp:242 - Version: SDXL
[INFO ] stable-diffusion.cpp:273 - Weight type: f16
[INFO ] stable-diffusion.cpp:274 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:275 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:276 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:278 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1070 - clip params backend buffer size = 235.06 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1070 - clip params backend buffer size = 1329.29 MB(VRAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1070 - unet params backend buffer size = 4900.07 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:1070 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:414 - loading weights
[DEBUG] model.cpp:1654 - loading tensors from /media/dati003/MODEL_DIFFUSION/sdxlUnstableDiffusers_v11.safetensors
[DEBUG] model.cpp:1654 - loading tensors from /media/dati003/MODEL_DIFFUSION/vae.fix.safetensors
[INFO ] stable-diffusion.cpp:498 - total params memory size = 6558.89MB (VRAM 6558.89MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:517 - loading model from '/media/dati003/MODEL_DIFFUSION/sdxlUnstableDiffusers_v11.safetensors' completed, taking 2.63s
[INFO ] stable-diffusion.cpp:544 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:588 - finished loaded file
[DEBUG] stable-diffusion.cpp:1472 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1201 - prompt after extract and remove lora: "a lovely cat"
[INFO ] stable-diffusion.cpp:671 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1206 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:330 - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] ggml_extend.hpp:1022 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] ggml_extend.hpp:1022 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] ggml_extend.hpp:1022 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] conditioner.hpp:458 - computing condition graph completed, taking 227 ms
[DEBUG] conditioner.hpp:330 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] ggml_extend.hpp:1022 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] ggml_extend.hpp:1022 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] ggml_extend.hpp:1022 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] conditioner.hpp:458 - computing condition graph completed, taking 208 ms
[INFO ] stable-diffusion.cpp:1339 - get_learned_condition completed, taking 436 ms
[INFO ] stable-diffusion.cpp:1362 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1366 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1022 - unet compute buffer size: 132.05 MB(VRAM)
|==================================================| 20/20 - 1.20s/it
[INFO ] stable-diffusion.cpp:1402 - sampling completed, taking 24.19s
[INFO ] stable-diffusion.cpp:1410 - generating 1 latent images completed, taking 24.20s
[INFO ] stable-diffusion.cpp:1413 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1022 - vae compute buffer size: 1664.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1043 - computing vae [mode: DECODE] graph completed, taking 1.47s
[INFO ] stable-diffusion.cpp:1424 - latent 1 decoded, taking 1.47s
[INFO ] stable-diffusion.cpp:1428 - decode_first_stage completed, taking 1.47s
[INFO ] stable-diffusion.cpp:1547 - txt2img completed in 26.11s
save result image to 'output.png'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants