-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vulkan: Make Vulkan optional at runtime (#11493). #11494
base: master
Are you sure you want to change the base?
Conversation
ggml/src/ggml-vulkan/ggml-vulkan.cpp
Outdated
ggml_vk_instance_init(); | ||
return vk_instance.device_indices.size(); | ||
} catch (const vk::SystemError& e) { | ||
std::cerr << "ggml_vulkan: Error: System error " << e.what() << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good for this to explicitly say something like "will fallback to CPU".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be a good idea, but one backend doesn't know what the user of 4 different backends (at the same time) will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could adapt ggml/src/ggml-backend-reg.cpp
void register_backend(ggml_backend_reg_t reg, dl_handle_ptr handle = nullptr) {
if (!reg) {
return;
}
#ifndef NDEBUG
GGML_LOG_DEBUG("%s: registered backend %s (%zu devices)\n",
__func__, ggml_backend_reg_name(reg), ggml_backend_reg_dev_count(reg));
#endif
backends.push_back({ reg, std::move(handle) });
for (size_t i = 0; i < ggml_backend_reg_dev_count(reg); i++) {
register_device(ggml_backend_reg_dev_get(reg, i));
}
}
to interpret ggml_backend_reg_dev_count(reg) returning 0 as "oops, don't use me". Eventually, as we have registered no device at all even though we tried we could say we are now using CPU only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some backends intentionally may have zero devices, for example the RPC backend does not have a device list by itself, they need to be created by the user. However returning NULL
for backends where this is not possible can be more efficient, since it will cause the backend to be unloaded completely when using GGML_BACKEND_DL
. So that would be the preferred option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, when I enable GGML_BACKEND_DL (in addition to GGML_VULKAN), the vulkan backend file disappears entirely from the installation.
I think libggml-vulkan.so moves from lib to bin (p.s. should be lib instead, no?) and cmake install doesn't know that or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through ggml-vulkan.cpp, it seems the intention is to late bind which vulkan instance to use exactly (defer decision as long as possible--which right now is not long at all). There's a mysterious comment
// Should be changed to return device-specific host buffer type
// but that probably requires changes in llama.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think libggml-vulkan.so moves from lib to bin and cmake install doesn't know that or something.
When GGML_BACKEND_DL
is enabled, backends are built as MODULE
targets instead of library, and one of the consequences is they go into the RUNTIME
directory instead. It's not very clear where they should be installed, currently ggml only looks for backends in the same directory as the executable, so for it to even work, they would need to be installed in the bin
directory, which is not great. So at the moment this is only useful for applications that handle backend loading themselves, but not as installable libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment is just about host buffers, not immediately relevant to this.
The intention was not to defer the decision, but at the time of writing it was unclear which function would get called first, so there's a number of options that trigger initializing the instance. Not sure if that has changed.
I think it's a good idea to leave a message about not having found any Vulkan devices or failing to initialize the instance, but you should probably use the GGML debug macro for that instead of piping to std::cerr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was probably written before the device/reg interfaces were added. Now the device interface has a function to obtain a host buffer for that device, so ideally it should be implemented so that each device returns the correct host buffer. llama.cpp at the moment only uses the host buffer of the first device in the list of devices (which may not be the default device if the user uses the -dev
argument).
ggml/src/ggml-vulkan/ggml-vulkan.cpp
Outdated
|
||
return vk_instance.device_indices.size(); | ||
try { | ||
ggml_vk_instance_init(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a failure here will leave things in a weird partially-initialized state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could special-case just the one exception vk::IncompatibleDriverError
that happens here--under the assumption that that one won't leave it in a partially-initialized state. What do you think?
The idea is that if the returned device count is 0 nobody will bother that backend again. So partially initialized or not--it won't be used.
Now if it left the GPU in a partially initialized state and other backends would fail using that GPU because of it, in my opinion that would be a Vulkan bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two expected cases, failure to initialize the Vulkan instance (issue with the loader) or no devices found (but an instance was created). It would probably be good to handle those inside of ggml_vk_instance_init()
, to be able to clean up the instance if one was created.
Is it better? What's your use case? I'm not opposed to this in princible, but it also isn't immediately problematic that the Vulkan backend requires Vulkan and a Vulkan-compatible device. Did you check how other backends handle this case? |
Than crashing before even reading the configuration? I think so.
The use case is that distributions can package llama.cpp once--and not have to create 2^6 different packages for the different enable/disable backend combinations.
I did not check that yet. Could someone with the respective backend already compiled in please try running |
The intention is to allow builds with multiple backends and let the application determine which ones to use at runtime. If a backend cannot work on the current system it must return null to the reg function, or return zero devices, but it must absolutely not crash the application. |
That makes sense, I'm still used to the separated builds. Does running multiple backends together already work? That leads to another question, too: Which backend takes priority? How do you avoid using the same device twice with two backends? |
It does work, especially with
This is not solved yet, and that's one of reasons we still aren't distributing unified builds with multiple backends. However, the user can manually specify which backend/devices to use with the |
a11559e
to
78610e7
Compare
I changed it to initialize on reg on. Tested it and it still works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not tested the changes, but the logic looks correct.
|
||
return ® | ||
try { | ||
ggml_vk_instance_init(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This took me way too long to figure out)
I tested this by deleting the Vulkan driver on my system. I found that this change was working fine on windows in debug builds, but crashing in release builds. Turns out the problem is that the default exception settings are /EHsc
(see https://learn.microsoft.com/en-us/cpp/build/reference/eh-exception-handling-model?view=msvc-170) and the c
means "the compiler assumes that functions declared as extern "C" never throw a C++ exception." ggml_vk_instance_init is extern "C", so this whole try/catch is optimized away. I don't think ggml_vk_instance_init is actually used outside of ggml-vulkan anymore, so the easiest fix may just be to remove the forward declaration with GGML_BACKEND_API.
Currently, if the Vulkan backend is enabled but Vulkan is not actually available at runtime, it will crash:
Better to just fall back to CPU. This is what this PR does.