Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply some low-hanging fruit optimizations to Vulkan RD #85532

Merged
merged 2 commits into from
Dec 8, 2023

Conversation

RandomShaper
Copy link
Member

@RandomShaper RandomShaper commented Nov 30, 2023

Multiple commits:

  1. Remove redundant explicit clears in the Vulkan RD

VK_ATTACHMENT_LOAD_OP_CLEAR, used in the Vulkan renderer, already specifies that the render target (full or the specified region) will be cleared by the driver. There's no need for adding extra commands for clearing.

UPDATE: It turned out only a subset of those cases could be avoided; namely, only if both color and depth are to be rendered to a region that will be cleared. So, the improvement is not as profound as initially expected, but it can still save a few manual clears in cases like shadow mapping.

  1. Remove superfluous locking in RID owners in Vulkan RD

The RenderingDevice API is fully thread-safe, in the lock-per-function fashion, so there's no need for the RID owners to be thread-safe. This change avoids a lot of spinlocks.


This PR would ideally be merged soon in 4.3, since it will not be very long-lived there, as the new RenderingDeviceDriver model will UPDATE: already contain the fixes change it all (spoiler: it will restore the explicit clearing of buffers, but will only do as an assistance to driver implementations, if the driver is flagged as needing the manual clear).

The most important thing of this PR is its cherry-picking into 4.2 (and older?), as it can't but improve performance.

@RandomShaper RandomShaper added enhancement topic:rendering cherrypick:4.0 cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release cherrypick:4.2 Considered for cherry-picking into a future 4.2.x release labels Nov 30, 2023
@RandomShaper RandomShaper added this to the 4.3 milestone Nov 30, 2023
@RandomShaper RandomShaper requested a review from a team as a code owner November 30, 2023 08:52
@akien-mga
Copy link
Member

4.0 is practically EOL, I would not cherry-pick enhancements there, only absolutely critical bugfixes.

@RandomShaper RandomShaper changed the title Remove redundant explicit clears in the Vulkan RD Apply some low-hanging fruit optimizations to Vulkan RD Nov 30, 2023
@sakrel
Copy link
Contributor

sakrel commented Nov 30, 2023

Looks like this PR breaks region clears used by the shadow atlas code in the official 2D Lights and Shadows demo:

RD::DrawListID draw_list = RD::get_singleton()->draw_list_begin(state.shadow_fb, initial_action, i != 3 ? RD::FINAL_ACTION_CONTINUE : RD::FINAL_ACTION_READ, initial_action, RD::FINAL_ACTION_DISCARD, cc, 1.0, 0, rect);

image
This patch appears to fix the issue, but the code mentions to "always fully operate on the whole framebuffer", so I'm not sure if this patch breaks anything else or kills perf:

diff --git a/drivers/vulkan/rendering_device_vulkan.cpp b/drivers/vulkan/rendering_device_vulkan.cpp
--- a/drivers/vulkan/rendering_device_vulkan.cpp
+++ b/drivers/vulkan/rendering_device_vulkan.cpp
@@ -3606,7 +3606,22 @@
 						dependency_from_external.srcStageMask |= reading_stages;
 					}
 				} break;
-				case INITIAL_ACTION_CLEAR_REGION_CONTINUE:
+				case INITIAL_ACTION_CLEAR_REGION_CONTINUE: {
+					if (p_attachments[i].usage_flags & TEXTURE_USAGE_COLOR_ATTACHMENT_BIT) {
+						description.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
+						description.initialLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
+						description.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
+					} else if (p_attachments[i].usage_flags & TEXTURE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) {
+						description.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
+						description.initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;
+						description.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
+					} else {
+						description.loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
+						description.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
+						description.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; // Don't care what is there.
+						dependency_from_external.srcStageMask |= reading_stages;
+					}
+				} break;
 				case INITIAL_ACTION_CONTINUE: {
 					if (p_attachments[i].usage_flags & TEXTURE_USAGE_COLOR_ATTACHMENT_BIT) {
 						description.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD;
@@ -6847,18 +6862,12 @@
 	render_pass_begin.pNext = nullptr;
 	render_pass_begin.renderPass = render_pass;
 	render_pass_begin.framebuffer = vkframebuffer;
-	/*
-	 * Given how API works, it makes sense to always fully operate on the whole framebuffer.
-	 * This allows better continue operations for operations like shadowmapping.
+
 	render_pass_begin.renderArea.extent.width = viewport_size.width;
 	render_pass_begin.renderArea.extent.height = viewport_size.height;
 	render_pass_begin.renderArea.offset.x = viewport_offset.x;
 	render_pass_begin.renderArea.offset.y = viewport_offset.y;
-	*/
-	render_pass_begin.renderArea.extent.width = framebuffer->size.width;
-	render_pass_begin.renderArea.extent.height = framebuffer->size.height;
-	render_pass_begin.renderArea.offset.x = 0;
-	render_pass_begin.renderArea.offset.y = 0;
+
 
 	Vector<VkClearValue> clear_values;
 	clear_values.resize(framebuffer->texture_ids.size());
@@ -6977,17 +6986,17 @@
 		viewport_size = regioni.size;
 		if (p_initial_color_action == INITIAL_ACTION_CLEAR_REGION_CONTINUE) {
 			will_clear_color = true;
-			p_initial_color_action = INITIAL_ACTION_CONTINUE;
+		//	p_initial_color_action = INITIAL_ACTION_CONTINUE;
 		}
 		if (p_initial_depth_action == INITIAL_ACTION_CLEAR_REGION_CONTINUE) {
-			p_initial_depth_action = INITIAL_ACTION_CONTINUE;
+		//	p_initial_depth_action = INITIAL_ACTION_CONTINUE;
 		}
 		if (p_initial_color_action == INITIAL_ACTION_CLEAR_REGION) {
 			will_clear_color = true;
-			p_initial_color_action = INITIAL_ACTION_KEEP;
+		//	p_initial_color_action = INITIAL_ACTION_KEEP;
 		}
 		if (p_initial_depth_action == INITIAL_ACTION_CLEAR_REGION) {
-			p_initial_depth_action = INITIAL_ACTION_KEEP;
+		//	p_initial_depth_action = INITIAL_ACTION_KEEP;
 		}
 	}
 

@RandomShaper
Copy link
Member Author

@sakrel, thank you very much for bug reporting. It turns out there were fewer cases that could be optimized reliably. I've updated the code and the PR description.

Copy link
Contributor

@BastiaanOlij BastiaanOlij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I wanted the constrain to region in there for awhile, especially the way we update shadow maps on mobile it should make a difference once we finish off the cascaded split logic.

@BastiaanOlij
Copy link
Contributor

This patch appears to fix the issue, but the code mentions to "always fully operate on the whole framebuffer", so I'm not sure if this patch breaks anything else or kills perf:

I don't know about 2D, but in 3D there are definitely assumptions in the way shadows are updated that the region is cleared as a whole, but seeing the new functionality here is governed by p_constrained_to_region most places should still use the old approach.

From what I understood from Reduz, this behaves better on desktop GPUs, from what I've understood about mobile, constraining to regions works better especially if we ensure that we respect tile boundaries in our atlasses.

So I wager we need to do follow up PRs to make the most out of these changes. I have a bunch of old shadow PRs that really need to be revisited and polished up.

@YuriSizov YuriSizov merged commit b0339b6 into godotengine:master Dec 8, 2023
15 checks passed
@YuriSizov
Copy link
Contributor

Thanks!

@RandomShaper RandomShaper deleted the kill_redund_clear branch December 11, 2023 13:38
@YuriSizov YuriSizov removed the cherrypick:4.2 Considered for cherry-picking into a future 4.2.x release label Jan 24, 2024
@YuriSizov
Copy link
Contributor

Cherry-picked for 4.2.2.

@YuriSizov YuriSizov removed the cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release label Jan 24, 2024
@YuriSizov
Copy link
Contributor

Cherry-picked for 4.1.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants