Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hal/gl: Allow push constants trough emulation #2400

Merged
merged 2 commits into from
Jan 22, 2022

Conversation

JCapucho
Copy link
Contributor

Connections
Depends on gfx-rs/naga#1672 so bumps naga revision to pick it up

Description
Push constants are a great way to pass small amounts of data to the gpu between draw calls.

But maintaining two code paths for backends that support push constants and those who don't can be prohibitive to picking them.

So a good middle ground is to emulate push constants in the secondary backends, this is done by using a uniform buffer for push constants.

Implementation notes:

The buffer is always kept mapped and when a set_push_constants call is issued the mapping is written to.

Push constants stages are ignored.

The uniform buffer uses the binding 0 in the shader.

TODOS:

  • The code currently uses the pointer returned from glMapBufferRange as a *mut u32 for the slice, but according to https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glMapBufferRange.xhtml alignment is not guaranteed so it might need to be changed to a *mut u8 and set_push_constants will need to convert the slice from u32 to u8
  • Another problem is synchronization, glMapBufferRange doesn't give much information but has an GL_MAP_UNSYNCHRONIZED_BIT so it leads me to believe that synchronization is implicit, but if someone who knows opengl better could confirm it would be helpful

Testing
I tested on a modified cube example

diff --git a/wgpu/examples/cube/main.rs b/wgpu/examples/cube/main.rs
index 60a49542..556ec979 100644
--- a/wgpu/examples/cube/main.rs
+++ b/wgpu/examples/cube/main.rs
@@ -124,10 +124,21 @@ impl Example {
 }
 
 impl framework::Example for Example {
+    fn required_features() -> wgt::Features {
+        wgt::Features::PUSH_CONSTANTS
+    }
+
     fn optional_features() -> wgt::Features {
         wgt::Features::POLYGON_MODE_LINE
     }
 
+    fn required_limits() -> wgpu::Limits {
+        wgpu::Limits {
+            max_push_constant_size: 64,
+            ..wgpu::Limits::downlevel_webgl2_defaults()
+        }
+    }
+
     fn init(
         config: &wgpu::SurfaceConfiguration,
         _adapter: &wgpu::Adapter,
@@ -179,7 +190,10 @@ impl framework::Example for Example {
         let pipeline_layout = device.create_pipeline_layout(&wgpu::PipelineLayoutDescriptor {
             label: None,
             bind_group_layouts: &[&bind_group_layout],
-            push_constant_ranges: &[],
+            push_constant_ranges: &[wgpu::PushConstantRange {
+                stages: wgpu::ShaderStages::VERTEX,
+                range: 0..4,
+            }],
         });
 
         // Create the texture
@@ -380,6 +394,7 @@ impl framework::Example for Example {
             rpass.set_bind_group(0, &self.bind_group, &[]);
             rpass.set_index_buffer(self.index_buf.slice(..), wgpu::IndexFormat::Uint16);
             rpass.set_vertex_buffer(0, self.vertex_buf.slice(..));
+            rpass.set_push_constants(wgpu::ShaderStages::VERTEX, 0, &(0.2_f32).to_ne_bytes());
             rpass.pop_debug_group();
             rpass.insert_debug_marker("Draw!");
             rpass.draw_indexed(0..self.index_count as u32, 0, 0..1);
diff --git a/wgpu/examples/cube/shader.wgsl b/wgpu/examples/cube/shader.wgsl
index 4bd4305a..ba8423b4 100644
--- a/wgpu/examples/cube/shader.wgsl
+++ b/wgpu/examples/cube/shader.wgsl
@@ -23,11 +23,16 @@ fn vs_main(
 [[group(0), binding(1)]]
 var r_color: texture_2d<u32>;
 
+struct PushConstants {
+    mul: f32;
+};
+var<push_constant> pc: PushConstants;
+
 [[stage(fragment)]]
 fn fs_main(in: VertexOutput) -> [[location(0)]] vec4<f32> {
     let tex = textureLoad(r_color, vec2<i32>(in.tex_coord * 256.0), 0);
     let v = f32(tex.x) / 255.0;
-    return vec4<f32>(1.0 - (v * 5.0), 1.0 - (v * 15.0), 1.0 - (v * 50.0), 1.0);
+    return pc.mul * vec4<f32>(1.0 - (v * 5.0), 1.0 - (v * 15.0), 1.0 - (v * 50.0), 1.0);
 }
 
 [[stage(fragment)]]

As well as a modified branch of https://gitlab.com/veloren/veloren

Note: I don't have much experience with openGL so the code might not be the best.

Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unexpected. What I thought we would be doing is - using freestanding uniforms for push constants. That's what gfx-hal did, IIRC. But instead this PR tries to do this via a uniform buffer.

The current approach isn't going to work from the point of view of synchronization between GPU and CPU. Within the same submission, multiple different draw calls would want to overwrite the mapped buffer contents, bind it, and use it on GPU. This means 100% data race between CPU and GPU.

@JCapucho
Copy link
Contributor Author

This is a bit unexpected. What I thought we would be doing is - using freestanding uniforms for push constants. That's what gfx-hal did, IIRC. But instead this PR tries to do this via a uniform buffer.

Should have probably checked if gfx-hal already had this implemented

The current approach isn't going to work from the point of view of synchronization between GPU and CPU. Within the same submission, multiple different draw calls would want to overwrite the mapped buffer contents, bind it, and use it on GPU. This means 100% data race between CPU and GPU.

Could you elaborate from what I read from the OpenGL wiki about synchronization, https://www.khronos.org/opengl/wiki/Synchronization#Implicit_synchronization

Most attempts to write to a buffer object, either with glBufferSubData or mapping, will halt until all rendering commands using that part of the buffer object have completed

So unless I'm reading wrong (which is definitely possible), writing to a mapped buffer is implicitly synchronized, even between draw calls

@kvark
Copy link
Member

kvark commented Jan 17, 2022

First of all, this "will halt" is a terrible thing, and we should not expose a feature that results in that happening potentially many times per frame.
Secondly, it talks about us doing glMapBuffer. This is not the same as just writing data into a buffer that's already mapped. With a persistently mapped buffer, the driver can't detect when you are writing to it.

@JCapucho
Copy link
Contributor Author

First of all, this "will halt" is a terrible thing, and we should not expose a feature that results in that happening potentially many times per frame. Secondly, it talks about us doing glMapBuffer. This is not the same as just writing data into a buffer that's already mapped. With a persistently mapped buffer, the driver can't detect when you are writing to it.

Okay, then I'll go with the gfx-hal approach, from my understanding the push constants data is copied to a global memory area and then after that it's made into a uniform with glUniform is that correct?

Since naga currently only supports one push constant buffer it would mean that it would only support 64 bytes for now

@kvark
Copy link
Member

kvark commented Jan 17, 2022

Okay, then I'll go with the gfx-hal approach, from my understanding the push constants data is copied to a global memory area and then after that it's made into a uniform with glUniform is that correct?

Yes. It's not far from what this PR is doing already.

@JCapucho
Copy link
Contributor Author

@kvark I've hit a problem, to use the glUniform functions a uniform location needs to be passed, but because naga is generating a uniform block it doesn't get a location but a binding, so either naga starts outputting non block uniforms for push constants or we can't use this method.

@JCapucho
Copy link
Contributor Author

I've tried fiddling around with the shaders and glsl only allows to use directly the struct without the wrapping block if it also drops the layout qualifier, which isn't that great since then we get implementation defined layouts.

@kvark
Copy link
Member

kvark commented Jan 17, 2022

Right, naga should not generate them in a uniform block.

JCapucho added a commit to JCapucho/naga that referenced this pull request Jan 18, 2022
Previously this was done with UBOs but this posed some problems when
integrating with wgpu (see gfx-rs/wgpu#2400)
@JCapucho
Copy link
Contributor Author

@kvark I've update it to use freestanding uniforms, it's basically a copy of what gfx-hal does.

Cargo.toml Outdated Show resolved Hide resolved
wgpu-hal/src/gles/command.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/device.rs Show resolved Hide resolved
wgpu-hal/src/gles/mod.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/queue.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/command.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/command.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/device.rs Outdated Show resolved Hide resolved
@JCapucho
Copy link
Contributor Author

@kvark I've updated the PR with your recommendations, I think the annoying Vec copy could be removed if instead we used a Arc<[UniformDesc; MAX_PUSH_CONSTANTS]> but this has the downside that it can't derive Default so State would need a custom impl.

Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, thank you for doing this!
Just a few smaller corrections are left to do

wgpu-hal/src/gles/command.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/command.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/command.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/device.rs Outdated Show resolved Hide resolved
@JCapucho JCapucho force-pushed the gl-push-constants branch 2 times, most recently from ec30a7b to 6c3b456 Compare January 21, 2022 16:35
@JCapucho
Copy link
Contributor Author

@kvark I've finished applying your suggestions, only one problem remains, UniformLocation for webgl is not Sync which causes CommandEncoder to also not be.

@kvark
Copy link
Member

kvark commented Jan 21, 2022

@JCapucho let's unsafe impl Sync for UniformDesc, at least for now

Uses freestanding uniforms for push constants
@JCapucho
Copy link
Contributor Author

@kvark everything seems to be working now, deno is failing because it's not picking up the patch for naga but once everything is merged upstream it should be fixed. Are we good to go?

Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!
So the last thing here is to change the naga dependency back to upstream.

@kvark
Copy link
Member

kvark commented Jan 21, 2022

kvark pushed a commit to gfx-rs/naga that referenced this pull request Jan 22, 2022
Previously this was done with UBOs but this posed some problems when
integrating with wgpu (see gfx-rs/wgpu#2400)
@JCapucho
Copy link
Contributor Author

JCapucho commented Jan 22, 2022

@kvark I've updated to latest naga, it includes quite some shader changes beacuse of the attribute changes so I've made it into another commit but it can be all squashed

@crowlKats
Copy link
Collaborator

@kvark seems to me like it was just ci being flaky due to caching

Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for updating the shaders to the new syntax!

@kvark kvark merged commit dcd07f0 into gfx-rs:master Jan 22, 2022
@JCapucho JCapucho deleted the gl-push-constants branch February 19, 2022 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants