-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
content_shell crashes with Vulkan error on startup with nvidia driver #863
Comments
I've made some changes, let me know if the latest master works for you. |
Also, how do I see this issue in the future? Can I just pull down Angle and force the loader commit for it? Then how do I run a good test? I'd appreciate knowing so I can try to run future changes that touch Environment variables through this. |
The latest roll succeeded so it looks like your changes resolved the problem, thanks! https://autoroll.skia.org/r/vulkan-deps-angle-autoroll Can you send an email via internal channels to ask about a process for testing? We had discussed it before and haven't yet come up with a good solution. |
Actually, my concern is this change: https://github.com/MarkY-LunarG/Vulkan-Loader/tree/add_new_env_vars It's also modifying the environment variable code and adding a new additive environment variable. That's what I wanted to verify works for you all before I push it up. |
Sorry for the late reply. I just tested the latest vulkan loader, the problem still happens. |
I found the crash is at line Line 261 in ab207b0
The crash is in the NVidia driver(we see crash with Intel driver as well),
|
@MarkY-LunarG can you look carefully at 0a19663 ? We're starting to get blocked on multiple because we aren't able to update the Khronos DEPS in Chromium for several months. |
So, I'm running with Nvidia and Intel for non-Agile runs and don't see any issues. And this function call is performed by VkCube and works for me. What in the above callstack is invalid, the physical device, the surface, both? It honestly looks to me like you're callchain got messed up. There should be a call to Also, you're source lines aren't matching with mine for the loader, what branch/tag are you building using? |
Also, have you tried disabling the Nvidia layers? I think you mentioned there was at least one running. If you run with VK_LOADER_DEBUG=driver,layer you should see output and no how to turn any implicit layers off. |
I am on commit 6b3cb37, however I added many printf() for debugging. so the line numbers are not meaningful. I uploaded a new stack with all fprintf() removed. And please also check the log with
https://gist.github.com/phuang/ffaa0695fba55aedb736de6682398185 |
I just built Angle. Is there a simple repro case I can try? |
Try
|
I get this when I run that test:
How do I force it to run? |
Here's my full output for the one test:
|
Could you please go to And are you using open sourced nvidia driver? I am using the driver from https://www.nvidia.com/download/index.aspx. |
I saw you have Intel GPU. Could you please try forcing to use the nvidia gpu (my removing intel vulkan driver so file)? |
If I force Nvidia ( I do see this warning message a few times when I turn on loader debugging:
Here's the loader view of the vkCreateInstance and vkCreateDevice callstack from the attempts:
And for master, I am using commit |
I also tried forcing
Is there a way to turn on the extension legitimately? |
The crash should happen before the |
I got loader log as below. There is a
|
I've tried it both ways. I was just disabling things to try to make the scenario different to see if it triggered Here's my command line: No mesa layer, Nvidia forced:
Mesa layer no driver forced:
I've tried multiple combinations as well. |
I've tried it both ways (enabled/disabled). You can disable it by defining the "Disable Env Var" for that implicit layer. So in my run I did it in a single line:
Has anyone else on your end reproduced this separately? |
I tried disabling the device_select or validation layer, they don't help.
|
Weird, I'm on FC 34... Are the failures grouped with a particular GPU vendor? |
FYI, I just reproduced the crash with
|
I added some printf to print arguments for calling
|
That all looks fine. I've got a more drastic idea. I've been working at generating more output for driver info. What if you modified your "vulkan-loader" branch to use the 'gen_all_tramp_term' branch off of my fork: https://github.com/MarkY-LunarG/Vulkan-Loader/tree/gen_all_tramp_term
Then rebuild and before you run set the environment variable to output loader debugging to When done, you can always Do you know of anyone who's system we might be able to access locally here (Colorado) that can reproduce this issue? Or perhaps remote log into? |
FYI, I also tried modify icd_index to 2 ( BTW, are you using chat.google.com or other IM? So maybe I can share my workstation access with you. |
The log with gen_all_tramp_term branch |
BTW, I also tested a standalone checkout Vulkan-Loader of your branch. It works fine. So the problem could be related to some build configuration.
|
@phuang I think the default loader build won't load SwiftShader the way Chromium uses it. |
I found below change fix the problem for me. What is
|
Good, that's a good clue. It creates a consistent device sorted result across all runs. It will sort discrete physical devices first based on PCI bus ID, then integrated devices (also by PCI bus ID if more than one) then software implementations. Previously, the order would vary based on the order read off of the directory using readdir (which is known to read results in random order). |
Probably |
No, the change is correct. You should add it if you want to create a PR. But the failure when sorting is disabled is still an issue that I'm looking into. |
I see. Please review #889 |
I've reproduced the issue on Linux with sorting disabled. I'll keep this issue open until I solve that problem. |
Great work Mark and Peng for pinpointing & fixing the issue! |
The physical device terminator was missing the ICD index in the non-sorted path. This caused crashes in Angle before it was realized that the sorting code was unintentionally disabled in that build path. Also, add tests to catch this case in the future in the WSI code, but this required converting all the TEST_F tests to TEST since Gtest didn't like mixing the 2 on my system. Finally, fix a few WSI error messages in the loader which were missing spaces. Fixes #863 for non-sorting paths
The physical device terminator was missing the ICD index in the non-sorted path. This caused crashes in Angle before it was realized that the sorting code was unintentionally disabled in that build path. Also, add tests to catch this case in the future in the WSI code, but this required converting all the TEST_F tests to TEST since Gtest didn't like mixing the 2 on my system. Finally, fix a few WSI error messages in the loader which were missing spaces. Fixes #863 for non-sorting paths
content_shell crashes with the latest vulkan loader + nvidia driver. Seems it is related to 0a19663
See https://bugs.chromium.org/p/chromium/issues/detail?id=1299378 for detail
The text was updated successfully, but these errors were encountered: