Result buffer getting destroyed while required to be alive by the command buffer in long running compute shader #5000

BLaZeKiLL · 2024-01-05T18:07:35Z

Description
Hi, I implemented a raytracer using wgpu and compute shaders. It works fine, but when I try to render a larger scene or anything that takes more than 8 secs, I get an empty buffer as output. without any errors logged or panics.

If I run with METAL_DEVICE_WRAPPER_TYPE=1 I get the following error

-[MTLDebugDevice notifyExternalReferencesNonZeroOnDealloc:]:3190: failed assertion `The following Metal object is being destroyed while still required to be alive by the command buffer 0x142835a00 (label: (wgpu internal) Signal):
<MTLToolsObject: 0x142712e80> -> <AGXG13XFamilyBuffer: 0x142712c90>
    label = Result buffer 
    length = 8294400 
    cpuCacheMode = MTLCPUCacheModeDefaultCache 
    storageMode = MTLStorageModeShared 
    hazardTrackingMode = MTLHazardTrackingModeTracked 
    resourceOptions = MTLResourceCPUCacheModeDefaultCache MTLResourceStorageModeShared MTLResourceHazardTrackingModeTracked  
    purgeableState = MTLPurgeableStateNonVolatile'
zsh: abort      cargo run --bin vexray

I am assuming this has something to do with how I am waiting for the shader to finish execution

pub async fn finish(
    &self,
    gpu: &Gpu,
    config: &KernelConfig,
    buffers: &KernelBuffers,
    submission_index: wgpu::SubmissionIndex
) -> Result<Vec<u8>, ()> {
    let mut output = vec![0u8; (config.result_size()) as usize];

    let result_slice = buffers.result.slice(..);

    let (sender, receiver) = flume::bounded(1);

    result_slice.map_async(wgpu::MapMode::Read, move |v| sender.send(v).unwrap());

    // Wait for result
    gpu.device.poll(wgpu::Maintain::WaitForSubmissionIndex(submission_index));

    if let Ok(Ok(_)) = receiver.recv_async().await {
        let result_view = result_slice.get_mapped_range();

        output.copy_from_slice(&result_view[..]);
    } else {
        return Err(());
    }

    // Cleanup
    // result view would be dropped by here
    buffers.result.unmap();

    return Ok(output);
}

I read in the examples that we should use device.poll on a separate thread, being new to rust I am not sure how to go about doing that, an example of the same would be helpful.

My main issue is no errors or panics are reported unless I set the env variable METAL_DEVICE_WRAPPER_TYPE=1, I tried adding a simple divide by zero error in my compute shaders and still no errors were reported.

Repro steps
Source code - https://github.com/BLaZeKiLL/wgpu-app

run the binary with cargo run --bin vexray
should output render.png

Expected vs observed behavior
result buffer shouldn't be dropped before copy is complete

Platform
OS: MacOS 14.2
Backend: Metal
Wgpu: 0.18
Rust: 1.74.1

The text was updated successfully, but these errors were encountered:

BLaZeKiLL · 2024-01-05T18:11:18Z

If I debug and put a breakpoint on the if let Ok(Ok(_)) in the above code and wait for like 10 secs on the breakpoint before processing it works fine and I get the correct result at output with no error from metal validation

cwfitzgerald · 2024-01-05T19:31:42Z

Hey, sorry meant to reply to your discussion but got sidetracked, I think this is #3601 - 5 seconds is our internal timeout, and we don't properly handle the case where the timeout is hit.

BLaZeKiLL · 2024-01-05T19:58:50Z

So we can't have compute shaders running for more than 5 seconds? I guess I need to split it up the shader invocation some how/

BLaZeKiLL · 2024-01-05T20:16:58Z

Also in case of a timeout the callback in map_async shouldn't trigger right ? As the callback is only suppose to trigger when the buffer is ready for mapping

cwfitzgerald · 2024-01-05T21:07:01Z

So we can't have compute shaders running for more than 5 seconds? I guess I need to split it up the shader invocation some how/

Correct, though this is a bug in our code, not yours.

Also in case of a timeout the callback in map_async shouldn't trigger right ? As the callback is only suppose to trigger when the buffer is ready for mapping

Yeah, this is a symptom of us treating timeout as "it's finished" not "it's not done yet"

BLaZeKiLL · 2024-01-05T21:33:58Z

Got it thanks

cwfitzgerald closed this as not planned Won't fix, can't repro, duplicate, stale Jan 5, 2024

cwfitzgerald mentioned this issue Jan 5, 2024

Timeout on Device::maintain with Maintain::WaitForSubmissionIndex is ignored #3601

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Result buffer getting destroyed while required to be alive by the command buffer in long running compute shader #5000

Result buffer getting destroyed while required to be alive by the command buffer in long running compute shader #5000

BLaZeKiLL commented Jan 5, 2024

BLaZeKiLL commented Jan 5, 2024

cwfitzgerald commented Jan 5, 2024

BLaZeKiLL commented Jan 5, 2024

BLaZeKiLL commented Jan 5, 2024

cwfitzgerald commented Jan 5, 2024

BLaZeKiLL commented Jan 5, 2024

Result buffer getting destroyed while required to be alive by the command buffer in long running compute shader #5000

Result buffer getting destroyed while required to be alive by the command buffer in long running compute shader #5000

Comments

BLaZeKiLL commented Jan 5, 2024

BLaZeKiLL commented Jan 5, 2024

cwfitzgerald commented Jan 5, 2024

BLaZeKiLL commented Jan 5, 2024

BLaZeKiLL commented Jan 5, 2024

cwfitzgerald commented Jan 5, 2024

BLaZeKiLL commented Jan 5, 2024