Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result buffer getting destroyed while required to be alive by the command buffer in long running compute shader #5000

Closed
BLaZeKiLL opened this issue Jan 5, 2024 · 6 comments

Comments

@BLaZeKiLL
Copy link

Description
Hi, I implemented a raytracer using wgpu and compute shaders. It works fine, but when I try to render a larger scene or anything that takes more than 8 secs, I get an empty buffer as output. without any errors logged or panics.

If I run with METAL_DEVICE_WRAPPER_TYPE=1 I get the following error

-[MTLDebugDevice notifyExternalReferencesNonZeroOnDealloc:]:3190: failed assertion `The following Metal object is being destroyed while still required to be alive by the command buffer 0x142835a00 (label: (wgpu internal) Signal):
<MTLToolsObject: 0x142712e80> -> <AGXG13XFamilyBuffer: 0x142712c90>
    label = Result buffer 
    length = 8294400 
    cpuCacheMode = MTLCPUCacheModeDefaultCache 
    storageMode = MTLStorageModeShared 
    hazardTrackingMode = MTLHazardTrackingModeTracked 
    resourceOptions = MTLResourceCPUCacheModeDefaultCache MTLResourceStorageModeShared MTLResourceHazardTrackingModeTracked  
    purgeableState = MTLPurgeableStateNonVolatile'
zsh: abort      cargo run --bin vexray

I am assuming this has something to do with how I am waiting for the shader to finish execution

pub async fn finish(
    &self,
    gpu: &Gpu,
    config: &KernelConfig,
    buffers: &KernelBuffers,
    submission_index: wgpu::SubmissionIndex
) -> Result<Vec<u8>, ()> {
    let mut output = vec![0u8; (config.result_size()) as usize];

    let result_slice = buffers.result.slice(..);

    let (sender, receiver) = flume::bounded(1);

    result_slice.map_async(wgpu::MapMode::Read, move |v| sender.send(v).unwrap());

    // Wait for result
    gpu.device.poll(wgpu::Maintain::WaitForSubmissionIndex(submission_index));

    if let Ok(Ok(_)) = receiver.recv_async().await {
        let result_view = result_slice.get_mapped_range();

        output.copy_from_slice(&result_view[..]);
    } else {
        return Err(());
    }

    // Cleanup
    // result view would be dropped by here
    buffers.result.unmap();

    return Ok(output);
}

I read in the examples that we should use device.poll on a separate thread, being new to rust I am not sure how to go about doing that, an example of the same would be helpful.

My main issue is no errors or panics are reported unless I set the env variable METAL_DEVICE_WRAPPER_TYPE=1, I tried adding a simple divide by zero error in my compute shaders and still no errors were reported.

Repro steps
Source code - https://github.com/BLaZeKiLL/wgpu-app

  • run the binary with cargo run --bin vexray
  • should output render.png

Expected vs observed behavior
result buffer shouldn't be dropped before copy is complete

Platform
OS: MacOS 14.2
Backend: Metal
Wgpu: 0.18
Rust: 1.74.1

@BLaZeKiLL
Copy link
Author

If I debug and put a breakpoint on the if let Ok(Ok(_)) in the above code and wait for like 10 secs on the breakpoint before processing it works fine and I get the correct result at output with no error from metal validation

@cwfitzgerald
Copy link
Member

Hey, sorry meant to reply to your discussion but got sidetracked, I think this is #3601 - 5 seconds is our internal timeout, and we don't properly handle the case where the timeout is hit.

@BLaZeKiLL
Copy link
Author

So we can't have compute shaders running for more than 5 seconds? I guess I need to split it up the shader invocation some how/

@BLaZeKiLL
Copy link
Author

Also in case of a timeout the callback in map_async shouldn't trigger right ? As the callback is only suppose to trigger when the buffer is ready for mapping

@cwfitzgerald
Copy link
Member

So we can't have compute shaders running for more than 5 seconds? I guess I need to split it up the shader invocation some how/

Correct, though this is a bug in our code, not yours.

Also in case of a timeout the callback in map_async shouldn't trigger right ? As the callback is only suppose to trigger when the buffer is ready for mapping

Yeah, this is a symptom of us treating timeout as "it's finished" not "it's not done yet"

@BLaZeKiLL
Copy link
Author

Got it thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants