-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First impressions info dump #1
Comments
Thanks for the feedback.
|
very nice
i see
oh, i overlooked that one run with verbose
the tokenizer really looks like it needs some work, really surprised the image came out that good.
good to hear.
cant wait 😄 |
I am using |
I'm using |
ah yes, a fellow ubuntu20.04 user stuck on lts 🤣 |
Cool stuff! Here is a sample run on M2 Ultra: $ ▶ ./sd -m ../models/sd-v1-4-ggml-model-f16.bin -p "a lovely cat" -t 12
[INFO] stable-diffusion.cpp:2191 - loading model from '../models/sd-v1-4-ggml-model-f16.bin'
[INFO] stable-diffusion.cpp:2216 - ftype: f16
[INFO] stable-diffusion.cpp:2261 - params ctx size = 1970.08 MB
[INFO] stable-diffusion.cpp:2401 - loading model from '../models/sd-v1-4-ggml-model-f16.bin' completed, taking 0.72s
[INFO] stable-diffusion.cpp:2482 - condition graph use 13.11MB of memory: static 10.17MB, dynamic = 2.93MB
[INFO] stable-diffusion.cpp:2482 - condition graph use 13.11MB of memory: static 10.17MB, dynamic = 2.93MB
[INFO] stable-diffusion.cpp:2824 - get_learned_condition completed, taking 0.12s
[INFO] stable-diffusion.cpp:2832 - start sampling
[INFO] stable-diffusion.cpp:2676 - step 1 sampling completed, taking 5.42s
[INFO] stable-diffusion.cpp:2676 - step 2 sampling completed, taking 5.35s
[INFO] stable-diffusion.cpp:2676 - step 3 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 4 sampling completed, taking 5.35s
[INFO] stable-diffusion.cpp:2676 - step 5 sampling completed, taking 5.30s
[INFO] stable-diffusion.cpp:2676 - step 6 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 7 sampling completed, taking 5.36s
[INFO] stable-diffusion.cpp:2676 - step 8 sampling completed, taking 5.47s
[INFO] stable-diffusion.cpp:2676 - step 9 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 10 sampling completed, taking 5.37s
[INFO] stable-diffusion.cpp:2676 - step 11 sampling completed, taking 5.33s
[INFO] stable-diffusion.cpp:2676 - step 12 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 13 sampling completed, taking 5.33s
[INFO] stable-diffusion.cpp:2676 - step 14 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 15 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 16 sampling completed, taking 5.33s
[INFO] stable-diffusion.cpp:2676 - step 17 sampling completed, taking 5.39s
[INFO] stable-diffusion.cpp:2676 - step 18 sampling completed, taking 5.36s
[INFO] stable-diffusion.cpp:2676 - step 19 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 20 sampling completed, taking 5.38s
[INFO] stable-diffusion.cpp:2691 - diffusion graph use 623.74MB of memory: static 69.53MB, dynamic = 554.21MB
[INFO] stable-diffusion.cpp:2837 - sampling completed, taking 107.12s
[INFO] stable-diffusion.cpp:2771 - vae graph use 2177.12MB of memory: static 1153.12MB, dynamic = 1024.00MB
[INFO] stable-diffusion.cpp:2844 - decode_first_stage completed, taking 17.86s
[INFO] stable-diffusion.cpp:2850 - txt2img completed in 125.10s, with a runtime memory usage of 2177.12MB and parameter memory usage of 1969.94MB
save result image to 'output.png'
Looks like
|
Thank you for the feedback. Thank you for creating such amazing ggml.
OK, I will sort out the code of new operators and upstream later. I'm also considering whether to upstream the "dynamic mode".
I've tried it before,but it seems that combining |
Any plans for sdxl? |
I'm willing to implement SDXL once I've improved the support for SD 1.x and added support for SD 2.x. |
Took a stab at a larger resolution 768x768
Details
unsurprisingly it takes way (way) longer:
|
Wow, this is so cool. Easy to convert existing models, quantization.. very nice. https://github.com/bes-dev/stable_diffusion.openvino <- this is way faster though, probably due to it using OpenVINO. |
I've implemented a memory optimization, and now when using txt2img with fp16 precision to generate a 512x512 image, it only requires 2.3GB. |
Oh, yeah. Now I'm working hard to make it run faster. |
is this already on master? bc i reran my diffusion above with similar timings and memory usage (? memory reporting changed) Details
|
Since you are generating 768x768 images, this will cause the runtime memory to grow, and there is still room for optimization |
@leejet i dont think that is how that label is supposed to be used 😄 |
You're right, I made a mistake. I accidentally clicked on it while browsing, it wasn't my intention. |
Any chance we could get OpenVino support? Would help a lot! |
Hey, finally stable diffusion for ggml 😄
Did a test run
Painpoint: the extra python libs for conversion. Got a pip install error bc i have an incompatible version of something installed already,
convert.py
worked anyway though. :)Timings: I used the q8_0 quantization and ran with different thread counts:
I have a 12core(24threads) cpu.
I took the timing of a sampling step.
Additional questions:
(cinematic:1.3)
)edit: added f16 timings
The text was updated successfully, but these errors were encountered: