Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hal/vk] Rework Submission and Surface Synchronization #5681

Merged

Conversation

cwfitzgerald
Copy link
Member

@cwfitzgerald cwfitzgerald commented May 8, 2024

Connections

Description

As described in #5559, our submission and surface synchronization was totally messed up. I've revamped it, made smarter classes to help keep all the bookkeeping straight, and generally make the code cleaner and easier to understand.

Testing

works on my machine lel

@JMS55
Copy link
Collaborator

JMS55 commented May 9, 2024

Works fine for me running the cube example with VK validation layers enabled

[2024-05-09T06:03:33Z INFO  wgpu_core::instance] Adapter Vulkan AdapterInfo { name: "Intel(R) Arc(TM) Graphics", vendor: 32902, device: 32085, device_type: IntegratedGpu, driver: "Intel Corporation", driver_info: "101.5382", backend: Vulkan }

@Vecvec
Copy link
Contributor

Vecvec commented May 9, 2024

This fixes an annoying access violation termination (that I thought was a driver bug) on program exit after a present.

@hecrj
Copy link
Contributor

hecrj commented May 9, 2024

Unfortunately, it does not fix #5644 for me. Same constant validation error on presentation:

[2024-05-09T08:16:24Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-vkAcquireNextImageKHR-semaphore-01779 (0x5717e75b)]
        Validation Error: [ VUID-vkAcquireNextImageKHR-semaphore-01779 ] Object 0: handle = 0xba7514000000002a, type = VK_OBJECT_TYPE_SEMAPHORE; | MessageID = 0x5717e75b | vkAcquireNextImageKHR():  Semaphore must not have any pending operations. The Vulkan spec states: If semaphore is not VK_NULL_HANDLE it must not have any uncompleted signal or wait operations pending (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-vkAcquireNextImageKHR-semaphore-01779)
[2024-05-09T08:16:24Z ERROR wgpu_hal::vulkan::instance]         objects: (type: SEMAPHORE, hndl: 0xba7514000000002a, name: ?)

The actual validation error only happens with a debug build, although release feels much slower than 0.19 as well. I believe something is very slow and the errors actually only show up in debug mode.

Flamegraph of cube seems to indicate this—with present taking a huge amount of time:

flamegraph

For comparison, same drivers with 0.19:

flamegraph

Even if I enable Immediate present mode, frames seem to take twice as much time to be rendered with 0.20.

@cwfitzgerald
Copy link
Member Author

@hecrj could you try again with these latest commits, someone else who can reproduce the validation error says that this fixed it.

@cwfitzgerald cwfitzgerald force-pushed the vk/fix-presentation-synchronization branch from ea6e6ba to 329513b Compare May 11, 2024 04:25
@hecrj
Copy link
Contributor

hecrj commented May 11, 2024

@cwfitzgerald Yep! The latest changes fix the validation errors. Thanks!

I'm still noticing a bit of worse performance in debug mode, but it doesn't seem to be there in release mode anymore. Flamegraphs look the same as before, but I imagine it's expected since now synchronization works properly and there may be more waiting. No errors in any case!

wgpu-hal/src/lib.rs Outdated Show resolved Hide resolved
@jimblandy
Copy link
Member

I'm actually rewriting these comments as I review so if you're not attached to them then don't worry too much about fixing anything.

@cwfitzgerald
Copy link
Member Author

Yeah that's fine - I was just trying to get something there.

wgpu-hal/src/lib.rs Outdated Show resolved Hide resolved
@jimblandy
Copy link
Member

jimblandy commented May 17, 2024

@cwfitzgerald What do you think of this? da543fb51

My hope was that this would make the states of RelaySemaphores a bit clearer to a new reader. If people should not use wait unless should_wait is true, then let's actually force people to check that condition before they can even get at the value: classic Option. But it does make advance fallible and require us to pass the device.

[edit: made this a little nicer still: RelaySemaphoreState is no longer necessary]

@jimblandy
Copy link
Member

@cwfitzgerald Here's an even more radical suggestion: b5d52fe

Just use a single semaphore for all queue ordering.

@cwfitzgerald
Copy link
Member Author

@jimblandy i think we can go for it, the only question is if we want to still avoid this deadlock in anv mentioned in the old code, it if we consider that old enough to ignore.

/// It would be correct to use a single semaphore there, but 
/// [Intel hangs in `anv_queue_finish`](https://gitlab.freedesktop.org/mesa/mesa/-/issues/5508).

I should have perseved that comment somehow.

@jimblandy
Copy link
Member

@cwfitzgerald:

I should have perseved that comment somehow.

Ahh, yes, I was reviewing by checking out the branch and just wandering around with rust-analyzer, so I didn't notice that the patch removed the comment. If that's the sole rationale for the entire semaphore-swapping doohickey then we had definitely better have the comment.

Kvark filed that bug in Oct 2021. That's not that long ago, so I think we'd better keep it. It sounds like a bear to debug, and I'd rather not have to debug it again.

@jimblandy
Copy link
Member

So I guess the remaining question is what you think of this one.

@jimblandy
Copy link
Member

cwfitzgerald and others added 10 commits May 23, 2024 15:53
Introduce a utility function for making binary semaphores, and use it
where appropriate.
This allows us to delete `should_wait`, and makes
`RelaySemaphoreState` the same as `RelaySemaphores`, so we can just
delete the former, and have `RelaySemaphores::advance` return a clone
of `self`'s current state.
@jimblandy jimblandy force-pushed the vk/fix-presentation-synchronization branch from a5a000c to 4cf108c Compare May 23, 2024 22:53
emilk added a commit to emilk/egui that referenced this pull request May 28, 2024
0.20 has a bunch of bugs that will be fixed by:

* gfx-rs/wgpu#5681

At Rerun, we don't want to wait for the wgpu 0.20.1 patch release
before we update egui, so we will temporarily downgrade to wgpu 0.19

After reverting I'll open a new PR that will update to 0.20 again,
with the intention of merging that once 0.20.1 is released.
emilk added a commit to emilk/egui that referenced this pull request May 28, 2024
0.20 has a bunch of bugs that will be fixed by:

* gfx-rs/wgpu#5681

At Rerun, we don't want to wait for the wgpu 0.20.1 patch release before
we update egui, so we will temporarily downgrade to wgpu 0.19

After reverting I'll open a new PR that will update to 0.20 again, with
the intention of merging that once 0.20.1 is released.
@jimblandy jimblandy merged commit c745863 into gfx-rs:trunk May 30, 2024
27 checks passed
cwfitzgerald added a commit that referenced this pull request Jun 12, 2024
Fix two major synchronization issues in `wgpu_val::vulkan`:

- Properly order queue command buffer submissions. Due to Mesa bugs, two semaphores are required even though the Vulkan spec says that only one should be necessary.

- Properly manage surface texture acquisition and presentation:

    - Acquiring a surface texture can return while the presentation engine is still displaying the texture. Applications must wait for a semaphore to be signaled before using the acquired texture.

    - Presenting a surface texture requires a semaphore to ensure that drawing is complete before presentation occurs.

Co-authored-by: Jim Blandy <[email protected]>
Wumpf added a commit to emilk/egui that referenced this pull request Jun 13, 2024
* this PR reverts #4559
* and re-applies #4433

Before we merge, we're waiting for a wgpu 0.20.1 patch-release of

* gfx-rs/wgpu#5681

---------

Co-authored-by: Andreas Reich <[email protected]>
morr0ne pushed a commit to verdiwm/wgpu that referenced this pull request Jul 28, 2024
Fix two major synchronization issues in `wgpu_val::vulkan`:

- Properly order queue command buffer submissions. Due to Mesa bugs, two semaphores are required even though the Vulkan spec says that only one should be necessary.

- Properly manage surface texture acquisition and presentation:

    - Acquiring a surface texture can return while the presentation engine is still displaying the texture. Applications must wait for a semaphore to be signaled before using the acquired texture.

    - Presenting a surface texture requires a semaphore to ensure that drawing is complete before presentation occurs.

Co-authored-by: Jim Blandy <[email protected]>
@cwfitzgerald cwfitzgerald removed the PR: needs back-porting PR with a fix that needs to land on crates label Jul 31, 2024
hacknus pushed a commit to hacknus/egui that referenced this pull request Oct 30, 2024
0.20 has a bunch of bugs that will be fixed by:

* gfx-rs/wgpu#5681

At Rerun, we don't want to wait for the wgpu 0.20.1 patch release before
we update egui, so we will temporarily downgrade to wgpu 0.19

After reverting I'll open a new PR that will update to 0.20 again, with
the intention of merging that once 0.20.1 is released.
hacknus pushed a commit to hacknus/egui that referenced this pull request Oct 30, 2024
* this PR reverts emilk#4559
* and re-applies emilk#4433

Before we merge, we're waiting for a wgpu 0.20.1 patch-release of

* gfx-rs/wgpu#5681

---------

Co-authored-by: Andreas Reich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants