Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression and incorrect handling of VK_EXT_debug_utils instance extension #116

Closed
pdaniell-nv opened this issue Dec 10, 2018 · 15 comments · Fixed by #318
Closed

Regression and incorrect handling of VK_EXT_debug_utils instance extension #116

pdaniell-nv opened this issue Dec 10, 2018 · 15 comments · Fixed by #318
Assignees
Labels
bug Something isn't working
Milestone

Comments

@pdaniell-nv
Copy link
Contributor

VK_EXT_debug_utils is an instance extension, but has some device-level functions like vkSetDebugUtilsObjectNameEXT, where the first parameter is a VkDevice. Regardless, since this is an instance extension, all functions must be queryable with vkGetInstanceProcAddr.

The 1.1.85 loader had a bug where the loader itself queried the device-level functions from the ICD using vkGetDeviceProcAddr. An attempt was made to fix this with: a3d5361. However, that fix was not complete and has caused a regression such that no device-level functions from instance extensions call down to the ICD. We're now in a situation with the release 1.1.92 loader that functions like vkSetDebugUtilsObjectTagEXT no longer call down to the ICD, which is a nasty regression of functionality for ICDs that implement VK_EXT_debug_utils.

The problem is that the internal loader functions loader_init_instance_extension_dispatch_table() or loader_icd_init_entries() do not initialize the VkLayerInstanceDispatchTable for the instance completely. Functions like vkSetDebugUtilsObjectTagEXT are left as NULL after the instance is created. This causes the internal trampoline function SetDebugUtilsObjectNameEXT to always see disp->SetDebugUtilsObjectNameEXT as NULL and not call into the ICD.

Additionally, nothing sets up the terminator_SetDebugUtilsObjectNameEXT and other terminator functions properly for the device-level instance extension functions. This is because these terminator functions are initialized in get_extension_device_proc_terminator(), which is only called as part of loader_gpa_device_internal() or device-level vkGetDeviceProcAddr. But these functions expect vkGetInstanceProcAddr to be used.

One might think that these device-level functions should be queried with vkGetDeviceProcAddr instead of vkGetInstanceProcAddr. Since these functions are part of a instance-level extension, which allows them to be implemented by the loader for all devices, we have to define it such that the application uses vkGetInstanceProcAddr to query them.

Both 1.1.85 and 1.1.92 loaders are broken for these functions and it's affecting use internally. I really hope this bug can be fixed before the next loader is released.

Here is some example code to reproduce the issue:

int main(int argc, char const * argv[])
{
    HMODULE vulkan = LoadLibrary("vulkan-1.dll");
    PFN_vkGetInstanceProcAddr vkGetInstanceProcAddr = (PFN_vkGetInstanceProcAddr)GetProcAddress(vulkan, "vkGetInstanceProcAddr");
    PFN_vkCreateInstance vkCreateInstance = (PFN_vkCreateInstance)vkGetInstanceProcAddr(0, "vkCreateInstance");

    VkApplicationInfo applicationInfo = { VK_STRUCTURE_TYPE_APPLICATION_INFO };
    applicationInfo.apiVersion = VK_API_VERSION_1_1;
    VkInstanceCreateInfo instanceInfo = { VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO };
    instanceInfo.pApplicationInfo = &applicationInfo;
    static const char * instanceExtensionNames[] = { "VK_EXT_debug_utils" };
    instanceInfo.ppEnabledExtensionNames = instanceExtensionNames;
    instanceInfo.enabledExtensionCount = sizeof instanceExtensionNames / sizeof instanceExtensionNames[0];
    VkInstance instance;
    vkCreateInstance(&instanceInfo, 0, &instance);

    PFN_vkEnumeratePhysicalDevices vkEnumeratePhysicalDevices = (PFN_vkEnumeratePhysicalDevices)vkGetInstanceProcAddr(instance, "vkEnumeratePhysicalDevices");
    PFN_vkCreateDevice vkCreateDevice = (PFN_vkCreateDevice)vkGetInstanceProcAddr(instance, "vkCreateDevice");
    PFN_vkSetDebugUtilsObjectNameEXT vkSetDebugUtilsObjectNameEXT = (PFN_vkSetDebugUtilsObjectNameEXT)vkGetInstanceProcAddr(instance, "vkSetDebugUtilsObjectNameEXT");

    uint32_t physicalDevices = 1;
    VkPhysicalDevice physicalDevice;
    vkEnumeratePhysicalDevices(instance, &physicalDevices, &physicalDevice);

    VkDeviceQueueCreateInfo queueInfo = { VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO };
    queueInfo.queueFamilyIndex = 0;
    queueInfo.queueCount = 1;
    float queuePriority = 0.5f;
    queueInfo.pQueuePriorities = &queuePriority;
    VkDeviceCreateInfo deviceCreateInfo = { VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO };
    deviceCreateInfo.queueCreateInfoCount = 1;
    deviceCreateInfo.pQueueCreateInfos = &queueInfo;
    VkDevice device;
    vkCreateDevice(physicalDevice, &deviceCreateInfo, 0, &device);

    VkDebugUtilsObjectNameInfoEXT objectName = { VK_STRUCTURE_TYPE_DEBUG_UTILS_OBJECT_NAME_INFO_EXT };
    objectName.objectType = VK_OBJECT_TYPE_DEVICE;
    objectName.objectHandle = (uint64_t)device;
    objectName.pObjectName = "device";
    vkSetDebugUtilsObjectNameEXT(device, &objectName); // BUG: This doesn't call down into the driver and does not call the terminator.

    return EXIT_SUCCESS;
}
@lenny-lunarg lenny-lunarg self-assigned this Dec 10, 2018
@pdaniell-nv pdaniell-nv added bug Something isn't working loader labels Dec 11, 2018
@pdaniell-nv
Copy link
Contributor Author

Any progress on this bug? Thanks.

@mdinkov
Copy link

mdinkov commented Jan 18, 2019

We recently ran into this issue as well.

@pdaniell-nv
Copy link
Contributor Author

Any progress on this bug? It's been a few months and is still affecting us. Thanks.

@lenny-lunarg
Copy link
Contributor

I'm looking into this bug now. It's a tricky issue since the loader wasn't really designed to support what this extension ended up doing (that is, the fact that this is an instance extension that includes device-level functionality). I'm trying to get a fix for this together, but I'm not sure how long it will take.

@pdaniell-nv
Copy link
Contributor Author

@lenny-lunarg were you able to make progress?

@lenny-lunarg
Copy link
Contributor

I've been working on a fix for this issue. I have a fix that resolves the immediate issue, but it breaks the behavior for a couple of other functions. I need to work out why my code is messing these things up. So I'm still working on this, but I don't have a fix ready yet.

How important of an issue is this for you? And specifically do you have a workaround? I ask because I'm tempted to make some larger changes to rework this code in the loader so that things like this should be easier to fix in the future. But if you don't have a workaround I may need to get a quick, dirty fix out sooner.

@pdaniell-nv
Copy link
Contributor Author

We have a work-around of sorts, but it's pretty ugly. We don't need a quick and dirty fix, and happy to wait for the proper fix. Thanks.

@KarenGhavam-lunarG
Copy link
Contributor

@pdaniell-nv After some discussion internally about next priorities for the loader it has become evident that the best way to fix this issue is to do some loader refactoring. The loader refactoring will take some time and we need to be sure we have a good thorough way of testing the refactored loader to be sure we haven't introduced new regressions.

Can you live with your workaround "indefinitely"? I say indefinitely because we are unsure of when the refactoring will be complete.

@pdaniell-nv
Copy link
Contributor Author

@KarenGhavam-lunarG I describe our workaround as "ugly" because it requires a hack in both the driver and the applications that use device-level VK_EXT_debug_utils functions. The hack is for the driver to expose these functions through vkGetDeviceProcAddr instead of just vkGetInstanceProcAddr, and for the application to use vkGetDeviceProcAddr instead of vkGetInstanceProcAddr for these functions. It would be nice to not have to keep propagating this going forward, so I'm reluctant to approve it "indefinitely". Hopefully your loader refactor has a timeline.

@KarenGhavam-lunarG
Copy link
Contributor

@pdaniell-nv I understand your concern about not having a timeline and waiting indefinitely. We don't have a clear timeline on the loader refactoring. So @lenny-lunarg will continue his attempts to resolve this issue before we do the loader refactoring.

@lenny-lunarg
Copy link
Contributor

So I just wanted to update this issue. I've been working on the issue lately, but I do not have a fix yet. I thought I had one before Thanksgiving, but my change failed CTS. So I guess the short summary is that I am actively working on this, but not much else to report.

@pdaniell-nv
Copy link
Contributor Author

pdaniell-nv commented Jan 15, 2020

Note this loader bug might affect the Vulkan VK_EXT_debug_utils extension spec example code here: KhronosGroup/Vulkan-Docs#1164

@gfxstrand
Copy link
Contributor

We talked about this on the WG call today and we think the right behaviour of the loader, to avoid breaking anyone, is to expose the entrypoints for VK_EXT_debug_utils through both vkGetDeviceProcAddr and vkGetInstanceProcAddr.

We would also like to see this bug get fixed sooner rather than later because the current behavior is such that you have to use vkGetDeviceProcAddr on Nvidia because it forwards through to the driver but you have to use vkGetInstanceProcAddr everywhere else to use the loader implementation. This means there's no one way that an app developer can write their code such that it works everywhere.

@lenny-lunarg
Copy link
Contributor

I just opened a PR (linked above) to address this issue. I'm sorry that this issue has been talking this long to resolve. I probably should have had it done a while ago.

I do want to make one big point, which is that we should absolutely avoid adding any more instance extensions that include device functions. The loader was simply never designed to handle this case and it wound up being far more complicated that supporting an extension should be. On top of the loader concerns, it seems completely opposed to the way Vulkan is designed and the very notion of instance and device functionality. So I think we need to be agree that in the future we should never be adding extensions that mix instance and device functions.

@pdaniell-nv
Copy link
Contributor Author

That PR seemed to work. Thanks for taking care of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants