vulkan: Make Vulkan optional at runtime (#11493). #11494

daym · 2025-01-29T17:51:11Z

Currently, if the Vulkan backend is enabled but Vulkan is not actually available at runtime, it will crash:

terminate called after throwing an instance of 'vk::IncompatibleDriverError'
  what():  vk::createInstance: ErrorIncompatibleDriver

Thread 1 "test-tokenizer-" received signal SIGABRT, Aborted.
0x00007ffff6eaa3fc in __pthread_kill_implementation () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/libc.so.6
(gdb) 
(gdb) bt
#0  0x00007ffff6eaa3fc in __pthread_kill_implementation () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/libc.so.6
#1  0x00007ffff6e604c2 in raise () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/libc.so.6
#2  0x00007ffff6e4a4a3 in abort () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/libc.so.6
#3  0x00007ffff70a586a in ?? () from /gnu/store/zzpbp6rr43smwxzvzd4qd317z5j7qblj-gcc-11.4.0-lib/lib/libstdc++.so.6
#4  0x00007ffff70b0e6a in ?? () from /gnu/store/zzpbp6rr43smwxzvzd4qd317z5j7qblj-gcc-11.4.0-lib/lib/libstdc++.so.6
#5  0x00007ffff70b0ed5 in std::terminate() () from /gnu/store/zzpbp6rr43smwxzvzd4qd317z5j7qblj-gcc-11.4.0-lib/lib/libstdc++.so.6
#6  0x00007ffff70b1128 in __cxa_throw () from /gnu/store/zzpbp6rr43smwxzvzd4qd317z5j7qblj-gcc-11.4.0-lib/lib/libstdc++.so.6
#7  0x00007ffff743b5b7 in vk::detail::throwResultException (message=0x7ffff74c5966 "vk::createInstance", result=vk::Result::eErrorIncompatibleDriver)
    at /gnu/store/14lzxwg5kbq01rnd7r7ir5k43083275j-vulkan-headers-1.3.280.0/include/vulkan/vulkan.hpp:6566
#8  vk::resultCheck (message=0x7ffff74c5966 "vk::createInstance", result=vk::Result::eErrorIncompatibleDriver)
    at /gnu/store/14lzxwg5kbq01rnd7r7ir5k43083275j-vulkan-headers-1.3.280.0/include/vulkan/vulkan.hpp:6757
#9  vk::createInstance<vk::DispatchLoaderStatic> (d=..., allocator=..., createInfo=...) at /gnu/store/14lzxwg5kbq01rnd7r7ir5k43083275j-vulkan-headers-1.3.280.0/include/vulkan/vulkan_funcs.hpp:47
#10 ggml_vk_instance_init () at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-vulkan/ggml-vulkan.cpp:2713
#11 0x00007ffff74772e9 in ggml_vk_get_device_count () at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-vulkan/ggml-vulkan.cpp:7305
#12 ggml_backend_vk_get_device_count () at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-vulkan/ggml-vulkan.cpp:7768
#13 0x00007ffff7477309 in ggml_backend_vk_reg_get_device_count (reg=<optimized out>) at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8113
#14 0x00007ffff7e54dfa in ggml_backend_registry::register_backend (handle=..., reg=0x7ffff74e19a0 <ggml_backend_vk_reg::reg>, this=0x7ffff7e5d300 <get_reg()::reg>)
    at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-backend-reg.cpp:208
#15 ggml_backend_registry::register_backend (handle=..., reg=0x7ffff74e19a0 <ggml_backend_vk_reg::reg>, this=0x7ffff7e5d300 <get_reg()::reg>)
    at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-backend-reg.cpp:198
#16 ggml_backend_registry::ggml_backend_registry (this=0x7ffff7e5d300 <get_reg()::reg>) at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-backend-reg.cpp:166
#17 get_reg () at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-backend-reg.cpp:292
#18 0x00007ffff7e551e9 in ggml_backend_dev_count () at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-backend-reg.cpp:336
#19 0x00007ffff7eb1a19 in llama_model_load_from_file_impl (path_model=..., splits=..., params=...) at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/src/llama.cpp:9409
#20 0x00007ffff7eb1c3b in llama_model_load_from_file (path_model=<optimized out>, params=...) at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/src/llama.cpp:9469
#21 0x0000000000410bdb in main (argc=2, argv=0x7fffffff5fe8) at /gnu/store/86fc8bi3mciljxz7c79jx8zr4wsx7xw8-gcc-11.4.0/include/c++/bits/basic_string.h

Better to just fall back to CPU. This is what this PR does.

jeffbolznv · 2025-01-29T19:22:22Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

+        ggml_vk_instance_init();
+        return vk_instance.device_indices.size();
+    } catch (const vk::SystemError& e) {
+        std::cerr << "ggml_vulkan: Error: System error " << e.what() << std::endl;


Would be good for this to explicitly say something like "will fallback to CPU".

Would be a good idea, but one backend doesn't know what the user of 4 different backends (at the same time) will do.

We could adapt ggml/src/ggml-backend-reg.cpp

void register_backend(ggml_backend_reg_t reg, dl_handle_ptr handle = nullptr) { if (!reg) { return; } #ifndef NDEBUG GGML_LOG_DEBUG("%s: registered backend %s (%zu devices)\n", __func__, ggml_backend_reg_name(reg), ggml_backend_reg_dev_count(reg)); #endif backends.push_back({ reg, std::move(handle) }); for (size_t i = 0; i < ggml_backend_reg_dev_count(reg); i++) { register_device(ggml_backend_reg_dev_get(reg, i)); } }

to interpret ggml_backend_reg_dev_count(reg) returning 0 as "oops, don't use me". Eventually, as we have registered no device at all even though we tried we could say we are now using CPU only.

Some backends intentionally may have zero devices, for example the RPC backend does not have a device list by itself, they need to be created by the user. However returning NULL for backends where this is not possible can be more efficient, since it will cause the backend to be unloaded completely when using GGML_BACKEND_DL. So that would be the preferred option.

For what it's worth, when I enable GGML_BACKEND_DL (in addition to GGML_VULKAN), the vulkan backend file disappears entirely from the installation.

I think libggml-vulkan.so moves from lib to bin (p.s. should be lib instead, no?) and cmake install doesn't know that or something.

Reading through ggml-vulkan.cpp, it seems the intention is to late bind which vulkan instance to use exactly (defer decision as long as possible--which right now is not long at all). There's a mysterious comment

// Should be changed to return device-specific host buffer type // but that probably requires changes in llama.cpp

I think libggml-vulkan.so moves from lib to bin and cmake install doesn't know that or something.

When GGML_BACKEND_DL is enabled, backends are built as MODULE targets instead of library, and one of the consequences is they go into the RUNTIME directory instead. It's not very clear where they should be installed, currently ggml only looks for backends in the same directory as the executable, so for it to even work, they would need to be installed in the bin directory, which is not great. So at the moment this is only useful for applications that handle backend loading themselves, but not as installable libraries.

That comment is just about host buffers, not immediately relevant to this.

The intention was not to defer the decision, but at the time of writing it was unclear which function would get called first, so there's a number of options that trigger initializing the instance. Not sure if that has changed.

I think it's a good idea to leave a message about not having found any Vulkan devices or failing to initialize the instance, but you should probably use the GGML debug macro for that instead of piping to std::cerr.

That was probably written before the device/reg interfaces were added. Now the device interface has a function to obtain a host buffer for that device, so ideally it should be implemented so that each device returns the correct host buffer. llama.cpp at the moment only uses the host buffer of the first device in the list of devices (which may not be the default device if the user uses the -dev argument).

jeffbolznv · 2025-01-29T19:24:00Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

-
-    return vk_instance.device_indices.size();
+    try {
+        ggml_vk_instance_init();


Seems like a failure here will leave things in a weird partially-initialized state.

We could special-case just the one exception vk::IncompatibleDriverError that happens here--under the assumption that that one won't leave it in a partially-initialized state. What do you think?

The idea is that if the returned device count is 0 nobody will bother that backend again. So partially initialized or not--it won't be used.

Now if it left the GPU in a partially initialized state and other backends would fail using that GPU because of it, in my opinion that would be a Vulkan bug.

There are two expected cases, failure to initialize the Vulkan instance (issue with the loader) or no devices found (but an instance was created). It would probably be good to handle those inside of ggml_vk_instance_init(), to be able to clean up the instance if one was created.

0cc4m · 2025-01-29T19:33:47Z

Better to just fall back to CPU.

Is it better? What's your use case? I'm not opposed to this in princible, but it also isn't immediately problematic that the Vulkan backend requires Vulkan and a Vulkan-compatible device.

Did you check how other backends handle this case?

daym · 2025-01-29T19:35:32Z

Is it better?

Than crashing before even reading the configuration? I think so.

What's your use case? I'm not opposed to this in princible, but it also isn't immediately problematic that the Vulkan backend requires Vulkan and a Vulkan-compatible device.

The use case is that distributions can package llama.cpp once--and not have to create 2^6 different packages for the different enable/disable backend combinations.

Did you check how other backends handle this case?

I did not check that yet.

Could someone with the respective backend already compiled in please try running llama-cli -dev none in a container without GPU access?

slaren · 2025-01-29T19:39:49Z

The intention is to allow builds with multiple backends and let the application determine which ones to use at runtime. If a backend cannot work on the current system it must return null to the reg function, or return zero devices, but it must absolutely not crash the application.

0cc4m · 2025-01-29T19:43:07Z

The intention is to allow builds with multiple backends and let the application determine which ones to use at runtime. If a backend cannot work on the current system it must return null to the reg function, or return zero devices, but it must absolutely not crash the application.

That makes sense, I'm still used to the separated builds. Does running multiple backends together already work?

That leads to another question, too: Which backend takes priority? How do you avoid using the same device twice with two backends?

slaren · 2025-01-29T19:54:10Z

It does work, especially with GGML_BACKEND_DL enabled, it allows to include backends even if they require driver libraries (e.g. the CUDA backend requires an NVIDIA driver to even load). So it is already possible to include any number of backends in a build.

That leads to another question, too: Which backend takes priority? How do you avoid using the same device twice with two backends?

This is not solved yet, and that's one of reasons we still aren't distributing unified builds with multiple backends. However, the user can manually specify which backend/devices to use with the -dev argument.

daym · 2025-01-29T22:40:39Z

I changed it to initialize on reg on. Tested it and it still works.

slaren

I have not tested the changes, but the logic looks correct.

jeffbolznv · 2025-01-31T05:02:38Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

-
-    return &reg;
+    try {
+        ggml_vk_instance_init();


(This took me way too long to figure out)

I tested this by deleting the Vulkan driver on my system. I found that this change was working fine on windows in debug builds, but crashing in release builds. Turns out the problem is that the default exception settings are /EHsc (see https://learn.microsoft.com/en-us/cpp/build/reference/eh-exception-handling-model?view=msvc-170) and the c means "the compiler assumes that functions declared as extern "C" never throw a C++ exception." ggml_vk_instance_init is extern "C", so this whole try/catch is optimized away. I don't think ggml_vk_instance_init is actually used outside of ggml-vulkan anymore, so the easiest fix may just be to remove the forward declaration with GGML_BACKEND_API.

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jan 29, 2025

jeffbolznv reviewed Jan 29, 2025

View reviewed changes

daym force-pushed the issue-11493 branch 3 times, most recently from a11559e to 78610e7 Compare January 29, 2025 21:51

vulkan: Make Vulkan optional at runtime (ggerganov#11493).

bc7f4bb

daym force-pushed the issue-11493 branch from 78610e7 to bc7f4bb Compare January 29, 2025 22:20

daym requested review from slaren, jeffbolznv and 0cc4m January 29, 2025 22:41

slaren approved these changes Jan 30, 2025

View reviewed changes

jeffbolznv requested changes Jan 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: Make Vulkan optional at runtime (#11493). #11494

vulkan: Make Vulkan optional at runtime (#11493). #11494

daym commented Jan 29, 2025

jeffbolznv Jan 29, 2025

daym Jan 29, 2025 •

edited

Loading

daym Jan 29, 2025

slaren Jan 29, 2025

daym Jan 29, 2025 •

edited

Loading

daym Jan 29, 2025 •

edited

Loading

slaren Jan 29, 2025

0cc4m Jan 29, 2025 •

edited

Loading

slaren Jan 29, 2025

jeffbolznv Jan 29, 2025

daym Jan 29, 2025 •

edited

Loading

0cc4m Jan 29, 2025

0cc4m commented Jan 29, 2025

daym commented Jan 29, 2025 •

edited

Loading

slaren commented Jan 29, 2025

0cc4m commented Jan 29, 2025

slaren commented Jan 29, 2025

daym commented Jan 29, 2025 •

edited

Loading

slaren left a comment

jeffbolznv Jan 31, 2025

vulkan: Make Vulkan optional at runtime (#11493). #11494

Are you sure you want to change the base?

vulkan: Make Vulkan optional at runtime (#11493). #11494

Conversation

daym commented Jan 29, 2025

Choose a reason for hiding this comment

daym Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daym Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

daym Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0cc4m Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daym Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0cc4m commented Jan 29, 2025

daym commented Jan 29, 2025 • edited Loading

slaren commented Jan 29, 2025

0cc4m commented Jan 29, 2025

slaren commented Jan 29, 2025

daym commented Jan 29, 2025 • edited Loading

slaren left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daym Jan 29, 2025 •

edited

Loading

daym Jan 29, 2025 •

edited

Loading

daym Jan 29, 2025 •

edited

Loading

0cc4m Jan 29, 2025 •

edited

Loading

daym Jan 29, 2025 •

edited

Loading

daym commented Jan 29, 2025 •

edited

Loading

daym commented Jan 29, 2025 •

edited

Loading