Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional invalid free() since 1d443867 when launching capture #1708

Closed
tambry opened this issue Feb 4, 2020 · 8 comments
Closed

Occasional invalid free() since 1d443867 when launching capture #1708

tambry opened this issue Feb 4, 2020 · 8 comments
Labels
Bug A crash, misbehaviour, or other problem Unresolved Waiting for a fix or implementation

Comments

@tambry
Copy link
Contributor

tambry commented Feb 4, 2020

Description

After launching the capture the capture stops immediately most of the time and the application exits with:

munmap_chunk(): invalid pointer

I've also seen other memory management errors, but they are rarer.
The capture launches successfully roughly 20% of the time for me.

I've bisected the issue to be caused by 1d44386.

There's a single validation error regarding vkUpdateDescriptorSets(), which I believe might be erroneous, although I'm not sure. I'll be opening an issue against validation layers for it.

Steps to reproduce

  1. Start capturing attached sample in RenderDoc.
  2. The capture will terminate almost immediately.

Sample

rd #1708.zip
Includes the source code and a compiled executable for Linux. Requires a X server, Vulkan 1.1 and Y'Cb'Cr sampling support from the driver.
Make sure that test.yuv is in the working directory of the executable.

The sample loads test.yuv, uploads it to a buffer, copies it into an image with the format VK_FORMAT_G8_B8_R8_3PLANE_420_UNORM and renders it onto a fullscreen quad using Y'Cb'Cr sampling from VK_KHR_sampler_ycbcr_conversion.

Environment

  • RenderDoc version: da3bd8d
  • Operating System: Debian Unstable
  • Kernel: Linux 5.4.0-3-amd64 #1 SMP Debian 5.4.13-1 (2020-01-19)
  • Graphics API: Vulkan
  • Driver: Mesa 19.3.3 RADV
@baldurk
Copy link
Owner

baldurk commented Feb 4, 2020

This actually crashes even before trying to capture for me, it crashes as soon as I launch the program. Is that what you're seeing, or does it only crash on capture for you?

Ping @bjoeris who worked on those changes you bisected to (thank you for that!). I think this is an issue with the subresource state handling not properly considering an aspect mask of VK_IMAGE_ASPECT_COLOR_BIT being used for planar formats as an alias for 'all planes'.

This patch seems to fix the issue, but I don't know if it is the only problem or if the aspect mask should get caught somewhere else, so I'd like Benson to weigh in on whether this is the right fix. It might work for you in the meantime.

diff --git a/renderdoc/driver/vulkan/vk_resources.h b/renderdoc/driver/vulkan/vk_resources.h
index d7d94cec8..4f4f498ca 100644
--- a/renderdoc/driver/vulkan/vk_resources.h
+++ b/renderdoc/driver/vulkan/vk_resources.h
@@ -1252,7 +1252,13 @@ struct ImageSubresourceRange
   }
   void Sanitise(const ImageInfo &info)
   {
-    if(aspectMask & ~info.Aspects())
+    if(aspectMask == VK_IMAGE_ASPECT_COLOR_BIT &&
+       (info.Aspects() &
+        (VK_IMAGE_ASPECT_PLANE_0_BIT | VK_IMAGE_ASPECT_PLANE_1_BIT | VK_IMAGE_ASPECT_PLANE_2_BIT)))
+    {
+      aspectMask = info.Aspects();
+    }
+    else if(aspectMask & ~info.Aspects())
     {
       if(aspectMask != VK_IMAGE_ASPECT_FLAG_BITS_MAX_ENUM)
       {

@baldurk baldurk added Bug A crash, misbehaviour, or other problem Unresolved Waiting for a fix or implementation labels Feb 4, 2020
@baldurk
Copy link
Owner

baldurk commented Feb 4, 2020

This also seems to reproduce on the VK_Video_Textures autotest project demo, which is unfortunately currently not tested.

@tambry
Copy link
Contributor Author

tambry commented Feb 4, 2020

This actually crashes even before trying to capture for me, it crashes as soon as I launch the program. Is that what you're seeing, or does it only crash on capture for you?

Yes, it crashes as soon as I launch. Apologies for the wrong terminology. 🙂

I'll test your patch tomorrow.

@tambry
Copy link
Contributor Author

tambry commented Feb 5, 2020

I can confirm the patch fixes the crash.

@bjoeris
Copy link
Contributor

bjoeris commented Feb 5, 2020

I am also worried that there are likely other places in my image tracking change that likely handle multi-planar images similarly incorrectly. I'll take a look at this manually, but I don't think I have a test application that actually uses multi-planar images--VK_Video_Textures fails on my machine because VK_KHR_sampler_ycbcr_conversion is missing (Vega, latest AMD drivers)

@baldurk
Copy link
Owner

baldurk commented Feb 5, 2020

I did find one other place that needs an update, though it's not from your commit (just your commit relies on this more I think):

diff --git a/renderdoc/driver/vulkan/vk_resources.h b/renderdoc/driver/vulkan/vk_resources.h
index d7d94cec8..ef5b51441 100644
--- a/renderdoc/driver/vulkan/vk_resources.h
+++ b/renderdoc/driver/vulkan/vk_resources.h
@@ -2294,7 +2294,7 @@ public:
     uint32_t packedViewType : 3;

-    // only need 4 bits for the aspects
-    uint32_t aspectMask : 4;
+    // need 7 bits for the aspects including planes
+    uint32_t aspectMask : 7;

     // 6 bits = refer to up to 62 mips = bloody huge textures.
     // note we also need to pack in VK_REMAINING_MIPS etc so we can't

If you have an NV GPU available to you it has good support for ycbcr conversion and planar/subsampled formats. That said, VK_Video_Textures could also be refactored to not rely on VK_KHR_sampler_ycbcr_conversion - technically that's only required for the actual 'auto conversion' path, it looks like latest AMD drivers do still support a couple of planar formats at least for manual sampling which is better than nothing:

Common Format Group[19]:
Formats:
        FORMAT_G8B8G8R8_422_UNORM
        FORMAT_B8G8R8G8_422_UNORM
        FORMAT_G8_B8_R8_3PLANE_420_UNORM
        FORMAT_G8_B8R8_2PLANE_420_UNORM
        FORMAT_G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16
        FORMAT_G16_B16R16_2PLANE_420_UNORM
Properties:
        linearTiling:
                FORMAT_FEATURE_SAMPLED_IMAGE_BIT
                FORMAT_FEATURE_TRANSFER_SRC_BIT
                FORMAT_FEATURE_TRANSFER_DST_BIT
        optimalTiling:
                FORMAT_FEATURE_SAMPLED_IMAGE_BIT
                FORMAT_FEATURE_TRANSFER_SRC_BIT
                FORMAT_FEATURE_TRANSFER_DST_BIT
        bufferFeatures:
                None

How would you like to approach this, do you think you'll be able to get a system that can run VK_Video_Textures to test further?

@bjoeris
Copy link
Contributor

bjoeris commented Feb 5, 2020

I was worried that the image state tracking might be implicitly assuming the image aspect was color or depth/stencil, but this actually seems to be generally handled correctly. FormatImageAspects() returns separate plane aspects for planar formats, so ImageInfo::Aspects() will also return the separate plane aspects. Sanitise should be called on every image subresource range used in the image state code, so that patch in Sanitize should suffice.

Also, I think Sanitise might have another issue: aspectMask &= ~info.Aspects(); looks wrong--I'm not sure this code path can actually be hit, but if it is, I think aspectMask = info.Aspects(); is probably correct.

I did spot one other spot that might have issues with multi-planar images: in VulkanReplay::GetTextureData, it looks like this will attempt to copy the data from a multi-planar image using a single VkBufferImageCopy with COLOR aspect (rather than separate copies for each plane)

I don't have easy access to an nvidia card for testing this week. I might be able to next week, if there are still issues here.

@baldurk
Copy link
Owner

baldurk commented Feb 5, 2020

Yeh I wasn't sure what was intended for aspectMask &= ~info.Aspects();. It looked like maybe it was to handle the case of VK_IMAGE_ASPECT_FLAG_BITS_MAX_ENUM coming through given the check on the error message, but I wasn't sure where that would come from. Just to be safe I made that an else in this color/plane aliasing case.

Good point about GetTextureData, I'll need to look into that and do per-plane copies there too. That is a pre-existing bug though so not too urgent to fix.

I'll commit the patches above for now. If you don't think on the surface that there should be any other issues then we can leave it until other bugs come up. However I think this raises the priority of getting VK_Video_Textures expanded and auto-tested.

@baldurk baldurk closed this as completed in 4d205eb Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A crash, misbehaviour, or other problem Unresolved Waiting for a fix or implementation
Projects
None yet
Development

No branches or pull requests

3 participants