webgpu: fix uniform block incorrectly #5487

xhcao · 2021-08-13T06:30:30Z

uniform block size will always be evenly divisible by sizeof(vec4),
so we must pad the ending of uniform block buffer if necessary.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

uniform block size will always be evenly divisible by sizeof(vec4), so we must pad the ending of uniform block buffer if necessary.

xhcao · 2021-08-13T06:34:42Z

@qjia7 @haoyunfeix @axinging @gyagp, please take a look, thank you.
layout(std140, set = 0, binding = 2) uniform Uniforms {
float NAN; ivec4 xShape; ivec2 size;
};
Take this uniform block for example, we need to 48 bytes buffer for this uniform block, but current status only alloc 40 bytes for the uniform block. It may run incorrectly on some machine.

xhcao · 2021-08-13T06:39:27Z

Some D3D cbuffer examples is https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-packing-rules, which is similar as uniform block.

axinging · 2021-08-13T07:27:00Z

tfjs-backend-webgpu/src/backend_webgpu.ts

+    padding = Math.ceil(currentOffset / 4) * 4 - currentOffset;
+    for (let p = 0; p < padding; ++p) {
+      dimUniformsData.push({type: 'int32', data: [0]});
+      dataViewIndex++;


No need dataViewIndex++;?

axinging · 2021-08-13T07:31:12Z

tfjs-backend-webgpu/src/backend_webgpu.ts

@@ -653,6 +653,14 @@ export class WebGPUBackend extends KernelBackend {
      currentOffset += d.data.length + padding;
    });

+    // Force the resulting size of uniform block to be evenly divisible
+    // by sizeof(four-component vector).


We need to add more comments here, so that we can understand why we need those lines, including the related platform and test case. And We'd better provide a test case.

axinging · 2021-08-13T07:32:27Z

tfjs-backend-webgpu/src/backend_webgpu.ts

@@ -653,6 +653,14 @@ export class WebGPUBackend extends KernelBackend {
      currentOffset += d.data.length + padding;
    });

+    // Force the resulting size of uniform block to be evenly divisible
+    // by sizeof(four-component vector).
+    padding = Math.ceil(currentOffset / 4) * 4 - currentOffset;


Can 4 be replaced with baseAlignment?

xhcao · 2021-09-01T08:58:51Z

Refer OpenGL ES 3.0 spec page 61, the std140 uniform block layout guarantees specific packing behavior and does not require the application to query the offsets and strides. But the minimum required size may still be queried from GL_UNIFORM_BLOCK_DATA_SIZE, this value is implementation-dependent minimum total buffer object size.
Take below uniform block object for an example,
layout (std140) uniform color_ubo
{
float color1;
vec3 color2; // starts a new vector
vec2 color3; // starts a new vector
};
Query the minimum required size on Angle, the value may be 40 or 48 on different backends or devices. So for OpenGL users, they must query the minimum required size before creating the buffer.
Currently, take above uniform block for an example, tfjs directly create a 40 bytes buffer for this uniform block.
@kainino0x, Does the dawn resolve implementation-dependent issue and tfjs does not need to care about it?

kainino0x · 2021-09-10T01:54:32Z

Apologies for the delay, I haven't kept up with my email/reviews.

I am not sure why that minimum size varies between GL implementations. In WebGPU we definitely intend to guarantee that the required size is consistent, so in this case either 40 on all systems or 48 on all systems.

This seems related to gpuweb/gpuweb#1558. I'll try to verify with our shader team.

ben-clayton · 2021-09-10T07:39:52Z

In WGSL [[block]] memory layout rules are identical between uniform and storage buffers, for all backends.
See 4.3.7. Memory Layout for these layout rules.

Note that while layout rules are identical between uniform and storage buffers, uniform buffers have tighter restrictions on the permitted layouts.

WGSL requires structures to be aligned to 16 bytes for uniform buffer usage. In terms of shader visibility, this only really affects the byte offset of a field that follows a field of a structure type.

Tint reports to Dawn the uniform buffer size rounded up to 16 bytes. I don't think this is currently covered by the spec, but I'm also not sure if this is actually observable to the client. Dawn folks are probably better to know the answer to that.

Kangz · 2021-09-10T09:07:04Z

The rounded up size that Tint returns to Dawn is observable through 1) the use of minBindingSize in the BindGroupLayout that will be validated that it is at least what Tint returned 2) the implicit getBindGroupLayout minBindingSize that will be validated against the binding size when creating a bind group with that layout.

kainino0x · 2021-09-10T18:49:07Z

In that case shouldn't it be impossible to use a buffer of size 40 with a {float,vec3,vec2} binding?

xhcao · 2021-09-24T08:52:07Z

@Kangz @kainino0x At present, does tfjs could get a valid minBindingSize value? Does tfjs need to explicitly query this value, or directly use the rounded up size (for example, 48 with a {float, vec3, vec2} binding)?
If tfjs needs to query the minBindingSize value, should all related buffers been created after compiling shaders?

kainino0x · 2021-09-24T19:07:22Z

You can't reflect on a pipeline or shader module to determine the minimum buffer binding size for a binding. However, setting minBindingSize is optional - it defaults to 0, meaning there's no minimum. If it is 0, there may be additional draw-time or shader checks for accessing the binding (I don't remember the details gpuweb/gpuweb#678, I also have no idea whether Dawn implements any optimization).

You can compute the actual minimum size: https://gpuweb.github.io/gpuweb/wgsl/#memory-layouts ("The size of a structure ...")
But you don't really need all of the details of that. If it's a GPU-private buffer used between dispatches, you'll have to compute it, but it should be rather simple (an array of some simple datatype, see the "array<>" types in that table).
And if it's a CPU-shared buffer and you can put bytes in the correct spots in the buffer (here's something I was working on for that), then you already know what the minimum size is, except perhaps for a bit of padding at the end.

webgpu: fix uniform block incorrectly

bfc435c

uniform block size will always be evenly divisible by sizeof(vec4), so we must pad the ending of uniform block buffer if necessary.

google-cla bot added the cla: yes label Aug 13, 2021

axinging reviewed Aug 13, 2021

View reviewed changes

qjia7 requested a review from kainino0x September 2, 2021 00:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webgpu: fix uniform block incorrectly #5487

webgpu: fix uniform block incorrectly #5487

xhcao commented Aug 13, 2021 •

edited by nsthorat

Loading

xhcao commented Aug 13, 2021

xhcao commented Aug 13, 2021

axinging Aug 13, 2021

axinging Aug 13, 2021

axinging Aug 13, 2021

xhcao commented Sep 1, 2021

kainino0x commented Sep 10, 2021

ben-clayton commented Sep 10, 2021 •

edited

Loading

Kangz commented Sep 10, 2021

kainino0x commented Sep 10, 2021

xhcao commented Sep 24, 2021

kainino0x commented Sep 24, 2021

webgpu: fix uniform block incorrectly #5487

Are you sure you want to change the base?

webgpu: fix uniform block incorrectly #5487

Conversation

xhcao commented Aug 13, 2021 • edited by nsthorat Loading

xhcao commented Aug 13, 2021

xhcao commented Aug 13, 2021

axinging Aug 13, 2021

Choose a reason for hiding this comment

axinging Aug 13, 2021

Choose a reason for hiding this comment

axinging Aug 13, 2021

Choose a reason for hiding this comment

xhcao commented Sep 1, 2021

kainino0x commented Sep 10, 2021

ben-clayton commented Sep 10, 2021 • edited Loading

Kangz commented Sep 10, 2021

kainino0x commented Sep 10, 2021

xhcao commented Sep 24, 2021

kainino0x commented Sep 24, 2021

xhcao commented Aug 13, 2021 •

edited by nsthorat

Loading

ben-clayton commented Sep 10, 2021 •

edited

Loading