Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan: memory management issue with ggml update #481

Open
stduhpf opened this issue Nov 25, 2024 · 4 comments
Open

Vulkan: memory management issue with ggml update #481

stduhpf opened this issue Nov 25, 2024 · 4 comments

Comments

@stduhpf
Copy link
Contributor

stduhpf commented Nov 25, 2024

When available VRAM becomes low, it looks like the Vulkan backend now allocates compute buffer on the shared memory, which causes very significant slowdowns, even if there is actually enough VRAM available. The older version of GGML used before c3eeb66 didn't have this issue.
I've had no luck finding the commit that introduced this behavior in ggml so far.

Example when generating a 896 x 896 image with Flux Schnell Q3_k, idle VRAM usage of 1.2 GB (Chrome and vsCode are opened in the background)

current reverting c3eeb66
Taskmgr screenshot 6.1/8GB VRAM, 1.8GB Shared Memory 7.7/8GB VRAM, 0.2GB Shared Memory
s/it 147.07 23.34

Relevant logs (identical between the two runs):

[INFO ] stable-diffusion.cpp:514  - total params memory size = 7658.71MB (VRAM 4978.14MB, RAM 2680.56MB): clip 2680.56MB(RAM), unet 4883.57MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:518  - loading model from '' completed, taking 5.92s
[INFO ] stable-diffusion.cpp:535  - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:589  - finished loaded file
[DEBUG] stable-diffusion.cpp:1463 - txt2img 896x896
[DEBUG] stable-diffusion.cpp:1193 - prompt after extract and remove lora: "a lovely cat holding a sign says 'flux.cpp'"
[INFO ] stable-diffusion.cpp:672  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1198 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:1027 - parse 'a lovely cat holding a sign says 'flux.cpp'' to [['a lovely cat holding a sign says 'flux.cpp'', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] t5.hpp:397  - token length: 256
[DEBUG] clip.hpp:736  - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736  - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - t5 compute buffer size: 68.25 MB(RAM)
[DEBUG] conditioner.hpp:1142 - computing condition graph completed, taking 3158 ms
[INFO ] stable-diffusion.cpp:1331 - get_learned_condition completed, taking 3162 ms
[INFO ] stable-diffusion.cpp:1354 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1358 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1026 - flux compute buffer size: 1715.46 MB(VRAM)
@Green-Sky
Copy link
Contributor

(dumb question if you already know this, but are you using git bisect ?)

@stduhpf
Copy link
Contributor Author

stduhpf commented Nov 25, 2024

(dumb question if you already know this, but are you using git bisect ?)

I tried, but with the API changes it was annoying to try and fix things at every bisect step. I also tried reverting Vulkan related commits one by one, but I couldn't identify the culprit easily this way either.

@Green-Sky
Copy link
Contributor

(dumb question if you already know this, but are you using git bisect ?)

I tried, but with the API changes it was annoying to try and fix things at every bisect step. I also tried reverting Vulkan related commits one by one, but I couldn't identify the culprit easily this way either.

Good. Yea its annoying to also change sd.cpp code. But it still works. :)

  • update the commit range if you know it.

@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 31, 2024

I gave up trying to fugure out the commit that introduced this behavior (the root cause is probably from the AMD drivers anyways).
But I found a workaround for when it happens: I launch restart64.exe (it comes with CRU), this restarts completely the graphics drivers, and memory allocation can work again, at least for one run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants