-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some very old ICDs in the wild crash when the loader tries to look up vkEnumerateInstanceVersion #182
Comments
Do you have any details on the ICDs that are crashing? It would be helpful to try to reproduce this on our end and inform options. |
So I initially reproduced this with an install of AMD driver version 17.1.1. (A rather grey-haired Vulkan 1.0.36 API) available from https://www.amd.com/en/support/kb/release-notes/rn-rad-win-17-1-1 I've been walking forward driver versions today, and it seems like it was resolved by 18.1.1 if not sooner. It seems like one telltale is the driver JSON having the string "abi_version" or "abi_versions" for the ICD rather than "api_version." But who knows, there might be other drivers out there with that string. |
We went as far as to detect Vulkan support in a separate process to avoid this: Would be very interested in detailed instructions for a better workaround, such as how to find file to look in for "abi_version". Is the Windows registry location of driver json mentioned in https://www.khronos.org/assets/uploads/developers/library/2017-vulkan-loader-webinar/VulkanLoaderDeepDive_Khronos_Mar17.pdf the one? |
Back when Vulkan 1.1 was being developed, we had some discussions about how the loader should check if a driver supports Vulkan 1.1. The loader needs to know this because in Vulkan 1.0, passing a version other than 1.0 into the The process we settled on is what the loader does now:
This method should be perfectly safe if all drivers are following proper Vulkan standards, but obviously we have to live with the drivers we have. As far as fixing the issue goes, I see three ways of avoiding this crash:
The first option sounds best to me (assuming it works), but I want to try to find why we settled on our current mechanism. I'm going to have to go through the old meeting notes to try and remember why. If there's a compelling reason to keep checking through
That was correct when it was made. If the driver that's crashing is more than about 2 years old, it will be there. I'm guessing most drivers that crash are old and will be in that location. But if the driver is newer, it could be located in any location described here. I'll also mention that I'm working on some changes that are going to make it even harder to track down exactly where a driver is located for future Windows drivers, so it may be tricky to find drivers in the future. I'm going to look at old meeting notes and also check if anyone else remembers why we use |
Incidentally we are running in 64-bit Windows; I don't know if it makes a difference, but I noticed PPSSPP was hitting the same crash as us and also on the 64-bit driver. |
We're looking at a possible workaround for now, which is a weaker version of always getting the driver version from the json file - we've made a modified loader that assumes the driver is 1.0 if it has no api_version field in the json. This definitely catches the problem ICDs that I personally know about, but it obviously might net other drivers I don't know about as well. In our case, our application uses very few 1.1 features and has guards around those it does, so if it gets an erroneous 1.0 instance it'll trundle along happily enough. So we're hoping this will stop the bleeding for us while discussion of a better workaround can continue. (We can also keep an eye out for any more crashy ICDs in the wild once we have a dodge for this specific case.) The commit is here in case this is useful as a temporary workaround for anyone else: |
Hello, I'm a frustrated user who found this issue by looking at the Vulkan license is my copy of the game No Man's Sky. The game just had a huge update and went from OpenGL to Vulkan. Some users like myself can not load the game at all. it crashes. I noticed in this issue that switch GPU vendors might be a factor. I used to use a AMD Radeon R7 260X and I switched to an NVidia Geoforce 1050ti. Could that be affecting why my game crashes on loading up? I'm just grasping for answers at this point. |
@Xeticus, if you installed the Vulkan SDK, run VIA (the Vulkan Installation Analyzer) and it will generate an HTML file which might hint at what problems you have. There are many reasons you could be seeing a crash (driver, app, etc) and this will at least help you know if Vulkan in general works on your system. |
Thanks for responding so quickly. Someone else posted a fix that worked for my problem.
Run Regedit
Navigate to : HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\Drivers
Find the DWORD entry "C:\Windows\System32\amd-vulkan64.json"
It looks switching from AMD to Nvidia was the problem but deleting that entry fixed it.
Thank you very much.
-----Original Message-----
From: Mark Young <[email protected]>
To: KhronosGroup/Vulkan-Loader <[email protected]>
Cc: Xeticus <[email protected]>; Mention <[email protected]>
Sent: Fri, Aug 16, 2019 5:33 pm
Subject: Re: [KhronosGroup/Vulkan-Loader] Some very old ICDs in the wild crash when the loader tries to look up vkEnumerateInstanceVersion (#182)
@Xeticus, if you installed the Vulkan SDK, run VIA (the Vulkan Installation Analyzer) and it will generate an HTML file which might hint at what problems you have. There are many reasons you could be seeing a crash (driver, app, etc) and this will at least help you know if Vulkan in general works on your system.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Has there been any progress on this? We recently switched to Vulkan and are experiencing the same issue on some of our players' machines. |
I solved my problem last night. Someone posted a fix involving a registry entry. I deleted it and my game up right away. It looks like it was using an AMD entry after I had switched to an Nvidia card. These are the instructions I followed.
Run RegeditNavigate to : HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\DriversFind the DWORD entry "C:\Windows\System32\amd-vulkan64.json"DELETE that entry, no others, just that one.Close Regedit, attempt to launch the game again...now it works.
I followed them and it worked like a charm.
-----Original Message-----
From: Rok Breulj <[email protected]>
To: KhronosGroup/Vulkan-Loader <[email protected]>
Cc: Xeticus <[email protected]>; Mention <[email protected]>
Sent: Sat, Aug 17, 2019 1:11 pm
Subject: Re: [KhronosGroup/Vulkan-Loader] Some very old ICDs in the wild crash when the loader tries to look up vkEnumerateInstanceVersion (#182)
Has there been any progress on this? We recently switched to Vulkan and are experiencing the same issue on some of our players' machines.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Manually deleting the old driver entry certainly works, but it's hardly reasonable to ask players to dig through their registries before they can run any Vulkan application. @Xeticus Could you possibly figure out what version of the AMD driver you had installed? Perhaps from your amd-vulkan64.json file (though as I recall there's not much there) or some other old files? It seems like there may be more than one way that old drivers are crashing during enumeration. @SephiRok We ended up shipping our own vulkan-1.dll (with the patch above) to be able to include workarounds for this sort of thing. If it would help you, I can provide you the binary. |
@hg-ctangora Thanks, for now I've compiled your patch from source and will report back if it eliminates the issues once it's been deployed. This will also happen when the user has an integrated card with outdated drivers and discrete card with up to date drivers, which does not appear to be very uncommon. Here are two drivers on which we've seen creating the vulkan instance crash:
The Intel one is an outlier, it's been predominantly the AMD occurance. If it's critical, I can likely dig up more instances from support reports. |
@SephiRok The AMD one is the same one we've been seeing in our dumps, so I'm afraid it's unlikely my patch will work on this. :( Looks like the old driver is spinning up a thread for some reason, which then crashes, so it may be necessary to blacklist that DLL entirely. It may have that weirdly-formatted json as well, which I previously used for workaround logic. Thanks for the driver version; I'll see if I can't get my PC into this state. |
@hg-ctangora Ah, that's unfortunate. I'll post any further info that comes across me. I'd also like to point out that using a seperate process is not a great workaround since there could be an ICD with working Vulkan support, but an erroneous ICD crashes the detection process. It may be a viable workaround for programs with multiple renderers, but not for Vulkan-only apps. |
Looks like this driver (version 1.6.0) doesn't have a weirdly-formed JSON anymore like 1.5.0 did, but it still crashes in the same place trying to retrieve vkEnumerateInstanceVersion. We're going to try a further hack, that skips those calls for any "amdvlk" driver that does not declare API 1.1 in its json -- new commit is linked below. The main concern I can think of with this sort of workaround is that some ICD might exist that was previously reporting 1.1 API, but would now be seen as 1.0 with this change. (That would be some ICD that had vkEnumerateInstanceVersion implemented, but reported a 1.0 API or no API in its JSON.) So a Vulkan app that requires 1.1 might see new issues somewhere. |
Looking at my old files this appears to be the name of the last AMD driver I had downloaded.
amd-catalyst-14-9-win7-win8.1-64bit-dd-ccc-whql.exe
…-----Original Message-----
From: hg-ctangora <[email protected]>
To: KhronosGroup/Vulkan-Loader <[email protected]>
Cc: Xeticus <[email protected]>; Mention <[email protected]>
Sent: Sun, Aug 18, 2019 5:14 am
Subject: Re: [KhronosGroup/Vulkan-Loader] Some very old ICDs in the wild crash when the loader tries to look up vkEnumerateInstanceVersion (#182)
Manually deleting the old driver entry certainly works, but it's hardly reasonable to ask players to dig through their registries before they can run any Vulkan application.
@Xeticus Could you possibly figure out what version of the AMD driver you had installed? Perhaps from your amd-vulkan64.json file (though as I recall there's not much there) or some other old files? It seems like there may be more than one way that old drivers are crashing during enumeration.
@SephiRok We ended up shipping our own vulkan-1.dll (with the patch above) to be able to include workarounds for this sort of thing. If it would help you, message me and I can provide you the binary.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I think I've seen two distinct crashes (both during instance creation) in the comments from this issue and I'd like to avoid confusing the two. But before I describe that I'd like to make a couple of points:
So the distinct issues I see are:
That being said, trying to fix issue 2 through a driver update may be a problem. The issue is that an AMD driver was left behind, and that driver crashes without an AMD GPU. Solving this through a driver update probably isn't practical because if we tell people to update their drivers, they'll update the drivers for the card they actually have. If we need to create a fix for that issue, the only way I see is roughly what @hrydgard described, except doing it when trying to create an instance. Splitting off another process adds complexity, on top of which it sounds like something that people would suspect of being a virus (which we've had people fearful of in the past). And it wouldn't even solve anything without a driver update. |
I think it's easy for us to agree that it would be preferable to update or remove the buggy drivers. The problem is that is not under an application's reasonable control, while distributing an updated loader is trivial enough -- unless I'm missing something. If the fix is done in the loader, would updating any graphics card driver also install a new version of the loader, essentially working around all the other crashing drivers? |
We've been investigating further and I think these are largely still the issue of vkGetInstanceProcAddr crashing. We had crash dumps in our reporting that matched the callstack that @SephiRok reported in the AMD driver. On installing that driver version, I did see a crash, but it was the familiar vkGetInstanceProcAddr issue. Yesterday we pushed a new vulkan-1.dll with the more aggressive workaround (string-comparing the driver filename against "amdvlk" and trusting the api_version field in the JSON if it matches) and we are no longer seeing that crash in the automated reports. So I don't know why the minidumps were showing that odd crash with no client code in the stack, but I think it was a red herring. Given that this is our most common driver-related issue at the moment I would guess it was the issue @Xeticus encountered as well. The exception is the callstack in the Intel driver that @SephiRok posted, which does seem to be a different issue. And yeah, I personally will be making sure to cleanly uninstall any old drivers from now on whenever I change graphics cards 😃 but it's an awfully hard thing to diagnose if you're just trying to play your new game. Getting some kind of workaround in the loader would make "update your drivers" work to fix it, and eventually propagate the workaround out via Windows Update. |
I don't disagree that a vkGetInstanceProcAddr crash might be common, but I have gotten crash logs from users that definitely indicate a crash within vkCreateInstance, in this particular case of old AMD drivers lingering on a machine that no longer has an AMD card (this is when creating a 1.0 instance, I'm not attempting to create a 1.1 instance). |
Apologies, the first call to vkGetInstanceProcAddr("vkEnumerateInstanceVersion") does actually happen in vkCreateInstance. So a crash in that call matches the vkGetInstanceProcAddr issue I've been seeing. (The mystery on my side is why the crash dumps I had for this issue don't show that.) Do you have any more information about the crashes you've been seeing that might differentiate them? |
Here's one report: hrydgard/ppsspp#11719 (comment) Seems to be the same one yeah. |
This is easy enough to do, but there's one thing you have to be careful of. If you do distribute a loader, you should really make sure it gets installed system-wide (generally through the runtime installer). The problem with not installing system-wide is that driver vendors don't generally test their drivers against older loader versions. Instead, they rely on the fact that they're shipping a loader with the driver to establish a minimum version. This means that as future driver versions come out, your loader that used to work may stop working as a result of a driver update. As a result, I'd strongly recommend against just putting a |
Based on this issue and some outside discussion, I think I'm going to get two changes ready that should address these issues:
I'll try to post PRs that implement these fixes shortly, and then we can evaluate the exact changes. |
I'm a little concerned with your first bullet point. I think it's a good idea, but my concern is how are you determining the ICD is 1.0 specific. Via the JSON file, via exposed entrypoints? I think that some of the ICD JSON files for some of the drivers that do support Vulkan 1.1 incorrectly list the driver version at 1.0.X. So we need to be very careful about this change. We should first make sure it's not going to break them by running this by people from the various ICD companies. |
From my point of view as an application developer I'd only be concerned if there were some 1.1 ICDs that actually crashed if the loader attempted to init them as 1.0. Remember, right now we live in a world where a small but significant number of Vulkan installs just crash without warning! This was our number 4 crash on PC at launch, and if you google "amd-vulkan64.json" you will find many other reports of other games seeing this same issue. If we instead have some installs that erroneously report only 1.0 support when they could in fact support 1.1, that is a much less serious problem. Especially if upgrading to latest drivers for your current card fixes it -- the natural thing for an app to do if it wants 1.1 but the loader reports 1.0 is nag the user to upgrade. |
I've been looking at a crash on startup we're seeing among end users, and I've traced it to some very old ICDs that crash when vkGetInstanceProcAddress( NULL, "vkEnumerateInstanceVersion" ) is called.
Since the 1.1 loader tries to do version enumeration in vkCreateInstance, if this ICD happens to be on a user's machine, it crashes without warning.
We've seen some cases where the user switched GPU vendors long ago, and are now seeing a crash due to a long-forgotten old ICD on their machine -- updating the driver for their new GPU doesn't help, and it's quite difficult for these end users to diagnose the problem.
A quick search of Steam forums reveals this issue is hitting more than one Vulkan game, so we'd like to try and work around it and unfortunately the loader seems like the least bad place to do it.
What would be a reasonable way to get the loader to avoid attempting the crashing vkEnumerateInstanceVersion lookup on certain ICDs?
The text was updated successfully, but these errors were encountered: