-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shader_debugprintf: support new VVL-DEBUG-PRINTF message and fix VVL version check for API selection #1187
base: main
Are you sure you want to change the base?
Conversation
…version check for API selection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this PR. I do have some remarks though, mostly related to comment and code structure. I think it's important that people can easily follow understand the changes ;)
Thanks @SaschaWillems for the feedback. I am away on vacation this week, but will make the requested changes when I am back. UPDATE: Back now and changes submitted in 0dc4963. |
No idea why, but with this PR and the latest SDK (1.3.296) and in windows, this sample is now again running with less than 1 fps. Forcing it to use VK 1.2 is somehow even slower (0 or inf fps). If I force VK 1.0 performance is fine, but I don't get any debug output. Not sure what is happening here and why this sample is so problematic. The debug printf sample from m own samples repo works just fine no matter the api version :/ |
Very strange. Can I ask you to recheck before and after this PR, but being careful with your SDK version selection and project gen/build? I did a lot of testing with old and new SDKs on Windows 10, Linux and macOS before submitting originally. I will go back and test again to see if I can somehow duplicate what you are seeing.
Debug PrintF requires Vulkan 1.1 or later. So no surprise that you are not getting debug output with API 1.0.
I suspect your repo's sample relies on the instrinsic Debug PrintF capability at the shader level on Windows. However, this is not cross-platform portable. Whereas the Vulkan-Samples one uses the VVL version of the feature all the time. Perhaps that is why you are seeing a difference at least on Windows. Again, I will so back and see if I can verify this. |
It also happens with the old code (before this PR). I only have SDK 1.3.296 installed. So probably a regression in the validation layers? |
Ok, I have rechecked this PR on Windows 10, and even fast-forwarded my local branch to current main HEAD just to make sure. I am using Vulkan SDK 1.3.296.0 with my Radeon RX6600XT GPU. My Vulkan Configurator has been reset to default settings. Is it possible that your Vulkan Configurator has a custom setting that is interfering with the sample? Or possibly a difference between AMD and nVidia GPUs? Just grasping at straws since I cannot duplicate your issue and the 1.3.296 VVL seems to be working correctly using API 1.1 for debug printf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, I have to distinguish two cases:
- VulkanConfigurator is running
VK_EXT_LAYER_SETTINGS_EXTENSION_NAME is available
instance creation is done by VulkanSample::create_instance (line 469)
render speed is high
debug_utils_message_callback is never called, thus no debugprintf output - VulkanConfigurator is not running
VK_EXT_LAYER_SETTINGS_EXTENSION_NAME is not available
instance creation is done locally (line 523)
render speed is extremely low
debug_utils_message_callback is called, with higher rate than the frame rate
Note, in case 2, you're using VkValidationFeaturesEXT
, which is part of VK_EXT_VALIDATION_FEATURES_EXTENSION_NAME
. But you don't ask for it in the ShaderDebugPrintf
constructor (or anywhere else). And in fact, that extension is not supported on my machine. Strange, that the VVL doesn't cry there.
That would explain why it's so slow for me. I never ran that sample with the VulkanConfigurator running. That's case 2. |
…ons to ShaderDebugPrintf::create_instance()
Thanks @asuessenbach for pointing out the missing
These changes may not be the final solution as I have observed the following when testing:
In summary:
Lastly, I thought |
…ing instance creation
…e::[HPP]Instance()
Ok, I think I have finally figured it out. It appears that you don't need to actually enable the
|
AFAIK, those two extensions ( Besides that, just to make sure it has been noted: As |
Welp, still sub 1 fps for me with latest SDK and vkconfig NOT running. Just let me know when it's in a state were I should test. If we can't get this to work, we may simply go back to the initial version and maybe remove the debug output and tell people to attach a graphics debugger. |
Thanks @asuessenbach for the info re nVidia GPUs. I have an AMD card and I guess this is the difference here. @SaschaWillems would you please test using this PR with vkconfig running and let me know the result? I presume you are using an nVidia GPU - please confirm. If this works, and as @asuessenbach suggests, I will try to detect this condition and offer a message to nVidia users. |
If we get to a point where we have to show a message under certain conditions to users of a certain vendor we're not heading where I'd like our samples to head. I'd rather remove the output debug stuff then. |
@SaschaWillems I understand. However I’d still like to track this down if possible and you testing on Nvidia with vkconfig active would give more information. I can’t do this test myself. Thx. |
Windows 11 23H2, nvidia RTX 4070, latest Vulkan developer driver, SDK 1.3.296. And I get <1 fps even with vkconfig up and running: I'm pretty sure that the sample ran fine when I initially wrote it, but not sure why it no longer does. Can't rule out a configuration issue on my side 100%, but not sure where to start looking. |
I just added a minor hygiene change to use More importantly, I was able to find an nVidia GPU to test this. I have narrowed down what causes the slowdown and am now convinced it is a VVL debugPrintfEXT defect on that GPU platform. Simply by disabling the following debugPrintfEXT feature enablement lines I can restore FPS performance on nVidia machines for both vkconfig running and not running cases. Unfortunately this drops the debug info, but hopefully this is a temporary thing until this issue can be addressed.
I will respond on the other thread to @spencer-lunarg to see if he can help. |
@SRSaunders before we had the Slow Down on for Vulkan 1.1 and 1.2/1.3 were good... is that still the case or is it now for all versions? |
When using an nVidia GPU with SDK 1.3.296, it slows down for all API versions. When using SDK 1.3.290 with the same setup (nVidia GPU), the sample works properly when using API 1.2 - as expected per previous discussion. For AMD GPUs (and Apple Silicon on macOS) with SDK 1.3.296 everything works properly when using API 1.1 |
ok, so the problem has be isolated down to an NVIDIA GPU (I was testing on Intel and found no issues)... Later tonight I will be back at my desk and can try again on my NVIDIA machine |
@SRSaunders I saw
So DebugPrintf I came out in 2020 and Layer Settings came out in 2023, so it was designed to be use the "old" way, 2023 we made it possible to use with Layer Settings (which is what vkconfig uses as well) in 2024 we have been slowly pouring more effort into GPU-AV and DebugPrintf was there so we merged it in, added features, fixed bugs, so lots have recently was touching it. I tried to update the VVL docs for it https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/docs/debug_printf.md Not in the 1.3.296 SDK (didn't want to rush it) but the next future SDK will have to go |
@spencer-lunarg thanks for updating the docs. This is the same doc I referred back to today when digging into the issue we are facing here.
Good to know there will be an even simpler way to enable debugPrintfEXT going forward. While we are on the topic of |
That is correct... something is wrong here (and of course it involves 3 different things) ... do me a favor, for this thing, create a new VVL issue and I can work with Christope (who maintains both VkConfig and wrote the The layer has to define its settings in way that can be used, for example you can set printf_to_stdout by setting What gets ugly is that we have legacy ways to do settings sadly...until the commit right after we branched for 1.3.296, DebugPrintf and GPU-AV both did Shader Instrumentation, but would step over each other, so we needed a way to prevent both used at the same time. This was done with this very ugly |
Also want to clarify that So tested, I have a laptop with AMD Radeon 780M integrated and NVIDIA 4060 (using env variables to turn on, not vkconfig, shouldn't really matter in theory)
With NVIDIA
so basically this PR is not fix/breaking things from my end... I agree this seems like something on the VVL side (we reopened the issue and I will look more into this weekend) but everything in this PR seems sane ... it should not take "hacks" to get DebugPrintf to work, if it does, we failed in VVL badly ... from a quick glance this seems to be a sync issue within how GPU-AV is working (we had to start using timeline semaphores under the hood to get things to be faster in GPU-AV, before we did a big vkQueueWaitIdle, ... but we share this code with Sync Validation, which has also been active and I feel we regressed here) |
@spencer-lunarg I have raised issue KhronosGroup/Vulkan-ValidationLayers#8760 as requested.
Thanks for reviewing and good to know this PR appears correct from your side. I look forward to any discoveries you make regarding slowdowns we are seeing with nVidia GPUs. |
…gPrintfEXT (cherry picked from commit 3365c7d974ae1cb7222cf35fdbe82accfa3fd926)
7de0de5
to
e23c4e5
Compare
…teInfo in [HPP]Instance::[HPP]Instance()
…ne for VVL layer name
Following interaction with the VVL team, I think this is now ready for review. A couple of learnings:
|
Is this still true? I tried to add it explicitly in this PR and didn't see it slow down (the issue was the old pre-1.3.290 SDK and should be patched now) |
I observed this performance impact when using SDK 1.3.290 with API 1.2 and timeline semaphores explicitly enabled. With SDK 1.3.296 using API 1.1 with timeline semaphores enabled, performance was fine. So I removed the explicit enablement of timeline semaphores and now I get consistent performance using: a) API 1.2 with SDKs <= 1.3.290, and b) API 1.1 with SDKs >= 1.3.296 (aside from the nVidia issue mentioned above). |
Given the long discussion and in case it wasn't clear, this PR is now ready to go. Note for nVidia users (Windows) using SDK 1.3.296: A fix is also required from the VVL which will ship in the next SDK. In the interim, nVidia users should run using this PR combined with SDK 1.3.290 for the shader_debugprintf sample. Note: This also removes the framework check in [hpp_]instance.cpp for |
Quick update: We discussed this in a recent call and will probably wait with the merge until the next SDK is available to make sure we can simply point people to update their SDK in case of any problems with this sample. Hope that's okay with you. |
@SaschaWillems that’s fine re the debugprintf sample. My only concern is the fix for the |
Is that fix important (sorry, I got kinda lost with all the discussion in this issue)? If so, splitting it into a separate PR is fine, if not it's fine if we merge all of this once the new SDK is out. |
I guess we can wait givin the |
Description
Fixes two issues that arose with Vulkan SDK 1.3.296:
VVL-DEBUG-PRINTF
callback message. Previous SDKs usedWARNING-DEBUG-PRINTF
orUNKNOWN-DEBUG-PRINTF
. Without this fix the debug data is not available in the UI Overlay.Fixes #1184.
Tested on Windows 10, Manjaro Linux, and macOS Ventura using Vulkan SDKs 1.3.290 and 1.3.296.
I hope this is the last time I have to fix this. It seems that VVL changes can easily break this sample.
General Checklist:
Please ensure the following points are checked:
Note: The Samples CI runs a number of checks including:
If this PR contains framework changes:
batch
command line argument to make sure all samples still work properlySample Checklist
If your PR contains a new or modified sample, these further checks must be carried out in addition to the General Checklist: