-
-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply some low-hanging fruit optimizations to Vulkan RD #85532
Conversation
4.0 is practically EOL, I would not cherry-pick enhancements there, only absolutely critical bugfixes. |
Looks like this PR breaks region clears used by the shadow atlas code in the official 2D Lights and Shadows demo:
This patch appears to fix the issue, but the code mentions to "always fully operate on the whole framebuffer", so I'm not sure if this patch breaks anything else or kills perf: diff --git a/drivers/vulkan/rendering_device_vulkan.cpp b/drivers/vulkan/rendering_device_vulkan.cpp
--- a/drivers/vulkan/rendering_device_vulkan.cpp
+++ b/drivers/vulkan/rendering_device_vulkan.cpp
@@ -3606,7 +3606,22 @@
dependency_from_external.srcStageMask |= reading_stages;
}
} break;
- case INITIAL_ACTION_CLEAR_REGION_CONTINUE:
+ case INITIAL_ACTION_CLEAR_REGION_CONTINUE: {
+ if (p_attachments[i].usage_flags & TEXTURE_USAGE_COLOR_ATTACHMENT_BIT) {
+ description.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
+ description.initialLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
+ description.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
+ } else if (p_attachments[i].usage_flags & TEXTURE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) {
+ description.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
+ description.initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;
+ description.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
+ } else {
+ description.loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
+ description.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
+ description.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; // Don't care what is there.
+ dependency_from_external.srcStageMask |= reading_stages;
+ }
+ } break;
case INITIAL_ACTION_CONTINUE: {
if (p_attachments[i].usage_flags & TEXTURE_USAGE_COLOR_ATTACHMENT_BIT) {
description.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD;
@@ -6847,18 +6862,12 @@
render_pass_begin.pNext = nullptr;
render_pass_begin.renderPass = render_pass;
render_pass_begin.framebuffer = vkframebuffer;
- /*
- * Given how API works, it makes sense to always fully operate on the whole framebuffer.
- * This allows better continue operations for operations like shadowmapping.
+
render_pass_begin.renderArea.extent.width = viewport_size.width;
render_pass_begin.renderArea.extent.height = viewport_size.height;
render_pass_begin.renderArea.offset.x = viewport_offset.x;
render_pass_begin.renderArea.offset.y = viewport_offset.y;
- */
- render_pass_begin.renderArea.extent.width = framebuffer->size.width;
- render_pass_begin.renderArea.extent.height = framebuffer->size.height;
- render_pass_begin.renderArea.offset.x = 0;
- render_pass_begin.renderArea.offset.y = 0;
+
Vector<VkClearValue> clear_values;
clear_values.resize(framebuffer->texture_ids.size());
@@ -6977,17 +6986,17 @@
viewport_size = regioni.size;
if (p_initial_color_action == INITIAL_ACTION_CLEAR_REGION_CONTINUE) {
will_clear_color = true;
- p_initial_color_action = INITIAL_ACTION_CONTINUE;
+ // p_initial_color_action = INITIAL_ACTION_CONTINUE;
}
if (p_initial_depth_action == INITIAL_ACTION_CLEAR_REGION_CONTINUE) {
- p_initial_depth_action = INITIAL_ACTION_CONTINUE;
+ // p_initial_depth_action = INITIAL_ACTION_CONTINUE;
}
if (p_initial_color_action == INITIAL_ACTION_CLEAR_REGION) {
will_clear_color = true;
- p_initial_color_action = INITIAL_ACTION_KEEP;
+ // p_initial_color_action = INITIAL_ACTION_KEEP;
}
if (p_initial_depth_action == INITIAL_ACTION_CLEAR_REGION) {
- p_initial_depth_action = INITIAL_ACTION_KEEP;
+ // p_initial_depth_action = INITIAL_ACTION_KEEP;
}
}
|
fd71420
to
72bf8df
Compare
@sakrel, thank you very much for bug reporting. It turns out there were fewer cases that could be optimized reliably. I've updated the code and the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I wanted the constrain to region in there for awhile, especially the way we update shadow maps on mobile it should make a difference once we finish off the cascaded split logic.
I don't know about 2D, but in 3D there are definitely assumptions in the way shadows are updated that the region is cleared as a whole, but seeing the new functionality here is governed by From what I understood from Reduz, this behaves better on desktop GPUs, from what I've understood about mobile, constraining to regions works better especially if we ensure that we respect tile boundaries in our atlasses. So I wager we need to do follow up PRs to make the most out of these changes. I have a bunch of old shadow PRs that really need to be revisited and polished up. |
Thanks! |
Cherry-picked for 4.2.2. |
Cherry-picked for 4.1.4. |
Multiple commits:
VK_ATTACHMENT_LOAD_OP_CLEAR
, used in the Vulkan renderer, already specifies that the render target (full or the specified region) will be cleared by the driver. There's no need for adding extra commands for clearing.UPDATE: It turned out only a subset of those cases could be avoided; namely, only if both color and depth are to be rendered to a region that will be cleared. So, the improvement is not as profound as initially expected, but it can still save a few manual clears in cases like shadow mapping.
The
RenderingDevice
API is fully thread-safe, in the lock-per-function fashion, so there's no need for the RID owners to be thread-safe. This change avoids a lot of spinlocks.This PR would ideally be merged soon in 4.3, since it will not be very long-lived there, as the new
RenderingDeviceDriver
model will UPDATE: already contain the fixeschange it all (spoiler: it will restore the explicit clearing of buffers, but will only do as an assistance to driver implementations, if the driver is flagged as needing the manual clear).The most important thing of this PR is its cherry-picking into 4.2 (and older?), as it can't but improve performance.