Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webgpu: fix uniform block incorrectly #5487

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

xhcao
Copy link
Contributor

@xhcao xhcao commented Aug 13, 2021

uniform block size will always be evenly divisible by sizeof(vec4),
so we must pad the ending of uniform block buffer if necessary.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

uniform block size will always be evenly divisible by sizeof(vec4),
so we must pad the ending of uniform block buffer if necessary.
@google-cla google-cla bot added the cla: yes label Aug 13, 2021
@xhcao
Copy link
Contributor Author

xhcao commented Aug 13, 2021

@qjia7 @haoyunfeix @axinging @gyagp, please take a look, thank you.
layout(std140, set = 0, binding = 2) uniform Uniforms {
float NAN; ivec4 xShape; ivec2 size;
};
Take this uniform block for example, we need to 48 bytes buffer for this uniform block, but current status only alloc 40 bytes for the uniform block. It may run incorrectly on some machine.

@xhcao
Copy link
Contributor Author

xhcao commented Aug 13, 2021

Some D3D cbuffer examples is https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-packing-rules, which is similar as uniform block.

padding = Math.ceil(currentOffset / 4) * 4 - currentOffset;
for (let p = 0; p < padding; ++p) {
dimUniformsData.push({type: 'int32', data: [0]});
dataViewIndex++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need dataViewIndex++;?

@@ -653,6 +653,14 @@ export class WebGPUBackend extends KernelBackend {
currentOffset += d.data.length + padding;
});

// Force the resulting size of uniform block to be evenly divisible
// by sizeof(four-component vector).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add more comments here, so that we can understand why we need those lines, including the related platform and test case. And We'd better provide a test case.

@@ -653,6 +653,14 @@ export class WebGPUBackend extends KernelBackend {
currentOffset += d.data.length + padding;
});

// Force the resulting size of uniform block to be evenly divisible
// by sizeof(four-component vector).
padding = Math.ceil(currentOffset / 4) * 4 - currentOffset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can 4 be replaced with baseAlignment?

@xhcao
Copy link
Contributor Author

xhcao commented Sep 1, 2021

Refer OpenGL ES 3.0 spec page 61, the std140 uniform block layout guarantees specific packing behavior and does not require the application to query the offsets and strides. But the minimum required size may still be queried from GL_UNIFORM_BLOCK_DATA_SIZE, this value is implementation-dependent minimum total buffer object size.
Take below uniform block object for an example,
layout (std140) uniform color_ubo
{
float color1;
vec3 color2; // starts a new vector
vec2 color3; // starts a new vector
};
Query the minimum required size on Angle, the value may be 40 or 48 on different backends or devices. So for OpenGL users, they must query the minimum required size before creating the buffer.
Currently, take above uniform block for an example, tfjs directly create a 40 bytes buffer for this uniform block.
@kainino0x, Does the dawn resolve implementation-dependent issue and tfjs does not need to care about it?

@qjia7 qjia7 requested a review from kainino0x September 2, 2021 00:06
@kainino0x
Copy link
Contributor

Apologies for the delay, I haven't kept up with my email/reviews.

I am not sure why that minimum size varies between GL implementations. In WebGPU we definitely intend to guarantee that the required size is consistent, so in this case either 40 on all systems or 48 on all systems.

This seems related to gpuweb/gpuweb#1558. I'll try to verify with our shader team.

@ben-clayton
Copy link

ben-clayton commented Sep 10, 2021

In WGSL [[block]] memory layout rules are identical between uniform and storage buffers, for all backends.
See 4.3.7. Memory Layout for these layout rules.

Note that while layout rules are identical between uniform and storage buffers, uniform buffers have tighter restrictions on the permitted layouts.

WGSL requires structures to be aligned to 16 bytes for uniform buffer usage. In terms of shader visibility, this only really affects the byte offset of a field that follows a field of a structure type.

Tint reports to Dawn the uniform buffer size rounded up to 16 bytes. I don't think this is currently covered by the spec, but I'm also not sure if this is actually observable to the client. Dawn folks are probably better to know the answer to that.

@Kangz
Copy link

Kangz commented Sep 10, 2021

The rounded up size that Tint returns to Dawn is observable through 1) the use of minBindingSize in the BindGroupLayout that will be validated that it is at least what Tint returned 2) the implicit getBindGroupLayout minBindingSize that will be validated against the binding size when creating a bind group with that layout.

@kainino0x
Copy link
Contributor

In that case shouldn't it be impossible to use a buffer of size 40 with a {float,vec3,vec2} binding?

@xhcao
Copy link
Contributor Author

xhcao commented Sep 24, 2021

@Kangz @kainino0x At present, does tfjs could get a valid minBindingSize value? Does tfjs need to explicitly query this value, or directly use the rounded up size (for example, 48 with a {float, vec3, vec2} binding)?
If tfjs needs to query the minBindingSize value, should all related buffers been created after compiling shaders?

@kainino0x
Copy link
Contributor

You can't reflect on a pipeline or shader module to determine the minimum buffer binding size for a binding. However, setting minBindingSize is optional - it defaults to 0, meaning there's no minimum. If it is 0, there may be additional draw-time or shader checks for accessing the binding (I don't remember the details gpuweb/gpuweb#678, I also have no idea whether Dawn implements any optimization).

You can compute the actual minimum size: https://gpuweb.github.io/gpuweb/wgsl/#memory-layouts ("The size of a structure ...")
But you don't really need all of the details of that. If it's a GPU-private buffer used between dispatches, you'll have to compute it, but it should be rather simple (an array of some simple datatype, see the "array<>" types in that table).
And if it's a CPU-shared buffer and you can put bytes in the correct spots in the buffer (here's something I was working on for that), then you already know what the minimum size is, except perhaps for a bit of padding at the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants