Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan: Make Vulkan optional at runtime (#11493). #11494

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

daym
Copy link

@daym daym commented Jan 29, 2025

Currently, if the Vulkan backend is enabled but Vulkan is not actually available at runtime, it will crash:

terminate called after throwing an instance of 'vk::IncompatibleDriverError'
  what():  vk::createInstance: ErrorIncompatibleDriver

Thread 1 "test-tokenizer-" received signal SIGABRT, Aborted.
0x00007ffff6eaa3fc in __pthread_kill_implementation () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/libc.so.6
(gdb) 
(gdb) bt
#0  0x00007ffff6eaa3fc in __pthread_kill_implementation () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/libc.so.6
#1  0x00007ffff6e604c2 in raise () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/libc.so.6
#2  0x00007ffff6e4a4a3 in abort () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/libc.so.6
#3  0x00007ffff70a586a in ?? () from /gnu/store/zzpbp6rr43smwxzvzd4qd317z5j7qblj-gcc-11.4.0-lib/lib/libstdc++.so.6
#4  0x00007ffff70b0e6a in ?? () from /gnu/store/zzpbp6rr43smwxzvzd4qd317z5j7qblj-gcc-11.4.0-lib/lib/libstdc++.so.6
#5  0x00007ffff70b0ed5 in std::terminate() () from /gnu/store/zzpbp6rr43smwxzvzd4qd317z5j7qblj-gcc-11.4.0-lib/lib/libstdc++.so.6
#6  0x00007ffff70b1128 in __cxa_throw () from /gnu/store/zzpbp6rr43smwxzvzd4qd317z5j7qblj-gcc-11.4.0-lib/lib/libstdc++.so.6
#7  0x00007ffff743b5b7 in vk::detail::throwResultException (message=0x7ffff74c5966 "vk::createInstance", result=vk::Result::eErrorIncompatibleDriver)
    at /gnu/store/14lzxwg5kbq01rnd7r7ir5k43083275j-vulkan-headers-1.3.280.0/include/vulkan/vulkan.hpp:6566
#8  vk::resultCheck (message=0x7ffff74c5966 "vk::createInstance", result=vk::Result::eErrorIncompatibleDriver)
    at /gnu/store/14lzxwg5kbq01rnd7r7ir5k43083275j-vulkan-headers-1.3.280.0/include/vulkan/vulkan.hpp:6757
#9  vk::createInstance<vk::DispatchLoaderStatic> (d=..., allocator=..., createInfo=...) at /gnu/store/14lzxwg5kbq01rnd7r7ir5k43083275j-vulkan-headers-1.3.280.0/include/vulkan/vulkan_funcs.hpp:47
#10 ggml_vk_instance_init () at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-vulkan/ggml-vulkan.cpp:2713
#11 0x00007ffff74772e9 in ggml_vk_get_device_count () at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-vulkan/ggml-vulkan.cpp:7305
#12 ggml_backend_vk_get_device_count () at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-vulkan/ggml-vulkan.cpp:7768
#13 0x00007ffff7477309 in ggml_backend_vk_reg_get_device_count (reg=<optimized out>) at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8113
#14 0x00007ffff7e54dfa in ggml_backend_registry::register_backend (handle=..., reg=0x7ffff74e19a0 <ggml_backend_vk_reg::reg>, this=0x7ffff7e5d300 <get_reg()::reg>)
    at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-backend-reg.cpp:208
#15 ggml_backend_registry::register_backend (handle=..., reg=0x7ffff74e19a0 <ggml_backend_vk_reg::reg>, this=0x7ffff7e5d300 <get_reg()::reg>)
    at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-backend-reg.cpp:198
#16 ggml_backend_registry::ggml_backend_registry (this=0x7ffff7e5d300 <get_reg()::reg>) at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-backend-reg.cpp:166
#17 get_reg () at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-backend-reg.cpp:292
#18 0x00007ffff7e551e9 in ggml_backend_dev_count () at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/ggml/src/ggml-backend-reg.cpp:336
#19 0x00007ffff7eb1a19 in llama_model_load_from_file_impl (path_model=..., splits=..., params=...) at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/src/llama.cpp:9409
#20 0x00007ffff7eb1c3b in llama_model_load_from_file (path_model=<optimized out>, params=...) at /tmp/guix-build-llama-cpp-0.0.0-b4549.drv-0/source/src/llama.cpp:9469
#21 0x0000000000410bdb in main (argc=2, argv=0x7fffffff5fe8) at /gnu/store/86fc8bi3mciljxz7c79jx8zr4wsx7xw8-gcc-11.4.0/include/c++/bits/basic_string.h

Better to just fall back to CPU. This is what this PR does.

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jan 29, 2025
ggml_vk_instance_init();
return vk_instance.device_indices.size();
} catch (const vk::SystemError& e) {
std::cerr << "ggml_vulkan: Error: System error " << e.what() << std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good for this to explicitly say something like "will fallback to CPU".

Copy link
Author

@daym daym Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be a good idea, but one backend doesn't know what the user of 4 different backends (at the same time) will do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could adapt ggml/src/ggml-backend-reg.cpp

    void register_backend(ggml_backend_reg_t reg, dl_handle_ptr handle = nullptr) {
        if (!reg) {
            return;
        }

#ifndef NDEBUG
        GGML_LOG_DEBUG("%s: registered backend %s (%zu devices)\n",
            __func__, ggml_backend_reg_name(reg), ggml_backend_reg_dev_count(reg));
#endif
        backends.push_back({ reg, std::move(handle) });
        for (size_t i = 0; i < ggml_backend_reg_dev_count(reg); i++) {
            register_device(ggml_backend_reg_dev_get(reg, i));
        }
    }

to interpret ggml_backend_reg_dev_count(reg) returning 0 as "oops, don't use me". Eventually, as we have registered no device at all even though we tried we could say we are now using CPU only.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some backends intentionally may have zero devices, for example the RPC backend does not have a device list by itself, they need to be created by the user. However returning NULL for backends where this is not possible can be more efficient, since it will cause the backend to be unloaded completely when using GGML_BACKEND_DL. So that would be the preferred option.

Copy link
Author

@daym daym Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, when I enable GGML_BACKEND_DL (in addition to GGML_VULKAN), the vulkan backend file disappears entirely from the installation.

I think libggml-vulkan.so moves from lib to bin (p.s. should be lib instead, no?) and cmake install doesn't know that or something.

Copy link
Author

@daym daym Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through ggml-vulkan.cpp, it seems the intention is to late bind which vulkan instance to use exactly (defer decision as long as possible--which right now is not long at all). There's a mysterious comment

// Should be changed to return device-specific host buffer type
// but that probably requires changes in llama.cpp

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think libggml-vulkan.so moves from lib to bin and cmake install doesn't know that or something.

When GGML_BACKEND_DL is enabled, backends are built as MODULE targets instead of library, and one of the consequences is they go into the RUNTIME directory instead. It's not very clear where they should be installed, currently ggml only looks for backends in the same directory as the executable, so for it to even work, they would need to be installed in the bin directory, which is not great. So at the moment this is only useful for applications that handle backend loading themselves, but not as installable libraries.

Copy link
Collaborator

@0cc4m 0cc4m Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment is just about host buffers, not immediately relevant to this.

The intention was not to defer the decision, but at the time of writing it was unclear which function would get called first, so there's a number of options that trigger initializing the instance. Not sure if that has changed.

I think it's a good idea to leave a message about not having found any Vulkan devices or failing to initialize the instance, but you should probably use the GGML debug macro for that instead of piping to std::cerr.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was probably written before the device/reg interfaces were added. Now the device interface has a function to obtain a host buffer for that device, so ideally it should be implemented so that each device returns the correct host buffer. llama.cpp at the moment only uses the host buffer of the first device in the list of devices (which may not be the default device if the user uses the -dev argument).


return vk_instance.device_indices.size();
try {
ggml_vk_instance_init();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a failure here will leave things in a weird partially-initialized state.

Copy link
Author

@daym daym Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could special-case just the one exception vk::IncompatibleDriverError that happens here--under the assumption that that one won't leave it in a partially-initialized state. What do you think?

The idea is that if the returned device count is 0 nobody will bother that backend again. So partially initialized or not--it won't be used.

Now if it left the GPU in a partially initialized state and other backends would fail using that GPU because of it, in my opinion that would be a Vulkan bug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two expected cases, failure to initialize the Vulkan instance (issue with the loader) or no devices found (but an instance was created). It would probably be good to handle those inside of ggml_vk_instance_init(), to be able to clean up the instance if one was created.

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 29, 2025

Better to just fall back to CPU.

Is it better? What's your use case? I'm not opposed to this in princible, but it also isn't immediately problematic that the Vulkan backend requires Vulkan and a Vulkan-compatible device.

Did you check how other backends handle this case?

@daym
Copy link
Author

daym commented Jan 29, 2025

Is it better?

Than crashing before even reading the configuration? I think so.

What's your use case? I'm not opposed to this in princible, but it also isn't immediately problematic that the Vulkan backend requires Vulkan and a Vulkan-compatible device.

The use case is that distributions can package llama.cpp once--and not have to create 2^6 different packages for the different enable/disable backend combinations.

Did you check how other backends handle this case?

I did not check that yet.

Could someone with the respective backend already compiled in please try running llama-cli -dev none in a container without GPU access?

@slaren
Copy link
Collaborator

slaren commented Jan 29, 2025

The intention is to allow builds with multiple backends and let the application determine which ones to use at runtime. If a backend cannot work on the current system it must return null to the reg function, or return zero devices, but it must absolutely not crash the application.

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 29, 2025

The intention is to allow builds with multiple backends and let the application determine which ones to use at runtime. If a backend cannot work on the current system it must return null to the reg function, or return zero devices, but it must absolutely not crash the application.

That makes sense, I'm still used to the separated builds. Does running multiple backends together already work?

That leads to another question, too: Which backend takes priority? How do you avoid using the same device twice with two backends?

@slaren
Copy link
Collaborator

slaren commented Jan 29, 2025

It does work, especially with GGML_BACKEND_DL enabled, it allows to include backends even if they require driver libraries (e.g. the CUDA backend requires an NVIDIA driver to even load). So it is already possible to include any number of backends in a build.

That leads to another question, too: Which backend takes priority? How do you avoid using the same device twice with two backends?

This is not solved yet, and that's one of reasons we still aren't distributing unified builds with multiple backends. However, the user can manually specify which backend/devices to use with the -dev argument.

@daym daym force-pushed the issue-11493 branch 3 times, most recently from a11559e to 78610e7 Compare January 29, 2025 21:51
@daym
Copy link
Author

daym commented Jan 29, 2025

I changed it to initialize on reg on. Tested it and it still works.

@daym daym requested review from slaren, jeffbolznv and 0cc4m January 29, 2025 22:41
Copy link
Collaborator

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested the changes, but the logic looks correct.


return &reg;
try {
ggml_vk_instance_init();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This took me way too long to figure out)

I tested this by deleting the Vulkan driver on my system. I found that this change was working fine on windows in debug builds, but crashing in release builds. Turns out the problem is that the default exception settings are /EHsc (see https://learn.microsoft.com/en-us/cpp/build/reference/eh-exception-handling-model?view=msvc-170) and the c means "the compiler assumes that functions declared as extern "C" never throw a C++ exception." ggml_vk_instance_init is extern "C", so this whole try/catch is optimized away. I don't think ggml_vk_instance_init is actually used outside of ggml-vulkan anymore, so the easiest fix may just be to remove the forward declaration with GGML_BACKEND_API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants