Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use VAOs for save+restore of vertexAttribPointer state between different webgl programs. #6913

Merged
merged 8 commits into from
Oct 26, 2022

Conversation

kdashg
Copy link
Contributor

@kdashg kdashg commented Oct 7, 2022

gpgpucontext.createProgram() calls vertexAttribPointer, but if you call this twice, the second createProgram() might overwrite the vertexAttribPointer data with different values. We believe this is the case more commonly in Firefox, and WebGL's spec has no guarantees per se here.

vertexAttribPointer should either be refreshed in gpgpucontext.useProgram, or ideally we can use a VAO and just (re)bind that in gpgpucontext.useProgram.


To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@google-cla
Copy link

google-cla bot commented Oct 7, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@kdashg
Copy link
Contributor Author

kdashg commented Oct 7, 2022

I believe this PR will address this bug filed against Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=1793986

@pyu10055 pyu10055 requested a review from lina128 October 10, 2022 20:42
Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @kdashg and @lina128)


tfjs-backend-webgl/src/gpgpu_context.ts line 288 at r1 (raw file):

    webgl_util.linkProgram(gl, program);

    const boundVao = webgl_util.callAndCheck(gl, () => gl.getParameter(gl.VERTEX_ARRAY_BINDING));

Is this a blocking call?
@lina128 This might break the parallel compilation.


tfjs-backend-webgl/src/gpgpu_context.ts line 293 at r1 (raw file):

    }
    if (!this.vertexAttrsAreBound) {
      this.setProgram(program);

The setProgram call is missing in the change.

@pyu10055 pyu10055 self-requested a review October 10, 2022 20:46
Copy link
Collaborator

@lina128 lina128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for digging the root cause and fix this issue!

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @kdashg and @pyu10055)


tfjs-backend-webgl/src/gpgpu_context.ts line 288 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

Is this a blocking call?
@lina128 This might break the parallel compilation.

Yeah, I think all get calls are blocking calls. Let's allow this PR to get merged. And then I can have a follow up PR that separates the setProgram part out to be controlled by the parallel compilation flag in the compileProgram.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @kdashg)


tfjs-backend-webgl/src/gpgpu_context.ts line 75 at r2 (raw file):

      this.createVertexArray = () => {
        return webgl_util.callAndCheck(gl,
          () => gl.createVertexArray());

gl needs to be casted to WebGL2RenderingContext type, otherwise it is failing ts check.

75 () => gl.createVertexArray());
~~~~~~~~~~~~~~~~~
tfjs-backend-webgl/src/gpgpu_context.ts:79:20 - error TS2339: Property 'bindVertexArray' does not exist on type 'WebGLRenderingContext'.

79 () => gl.bindVertexArray(vao as WebGLVertexArrayObject));
~~~~~~~~~~~~~~~
tfjs-backend-webgl/src/gpgpu_context.ts:83:20 - error TS2339: Property 'deleteVertexArray' does not exist on type 'WebGLRenderingContext'.

83 () => gl.deleteVertexArray(vao as WebGLVertexArrayObject));
~~~~~~~~~~~~~~~~~
tfjs-backend-webgl/src/gpgpu_context.ts:87:36 - error TS2339: Property 'VERTEX_ARRAY_BINDING' does not exist on type 'WebGLRenderingContext'.

87 () => gl.getParameter(gl.VERTEX_ARRAY_BINDING));
~~~~~~~~~~~~~~~~~~~~


tfjs-backend-webgl/src/gpgpu_context.ts line 344 at r2 (raw file):

    let program2: GPGPUContextProgram;
    const vaoWas = this.getVertexArray();

this is a blocking call, which would likely break our parallel compilation code. @lina128


tfjs-backend-webgl/src/gpgpu_context.ts line 359 at r2 (raw file):

      }
    }
    this.bindVertexArray(vaoWas);

why you need to rebind the old vertex array? The setProgram seems to bind the new vertex immediately?

@kdashg
Copy link
Contributor Author

kdashg commented Oct 10, 2022

I'm really sorry, I don't really feel comfortable giving Reviewable access to "read and write all public repository data" for my account.

I'll include my responses here:

gl needs to be casted to WebGL2RenderingContext type, otherwise it is failing ts check.

Weird, typechecking worked for me locally! Generally it should be inferring that gl is constrained to WebGL2RenderingContext from the instanceof branch!

this is a blocking call, which would likely break our parallel compilation code. @lina128

This isn't a blocking call normally in Firefox, at least! (it doesn't interact with program stuff)

The setProgram call is missing in the change.

It's still there, it's just further down now.

why you need to rebind the old vertex array? The setProgram seems to bind the new vertex immediately?

Yeah, it looks like that's probably true in the final form of the patch here.

I did it because it's the safest thing to do correctness-wise. If anything else in the library or in an app expects the bound VertexArray to remain unchanged, neglecting to reset this to its previous value would cause breakage.
(I believe this is exactly the class of errors we're running into in the case I'm investigating: https://bugzilla.mozilla.org/show_bug.cgi?id=1793986 )

It's kinda guessable that createProgram might change GL's current program, but it's not at all expected that it would change the VAO.

An alternative would be to lazily create the VAO in executeProgram.

I'm not familiar with this codebase, and I can't figure out how to build and test it locally. (I can get it to do lint checks, but not integrate with the testcase I need to investigate)
As such, I'm being very cautious and paranoid about correctness. :)

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdashg Thank you for the explanation. I ran tests your forked branch, I am seeing this errors for most of the shaders (on Chrome)
[.WebGL-0x7e000325400] GL_INVALID_OPERATION: Must have element array buffer bound.
Do you know what might be the cause? thanks

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @kdashg)

@kdashg
Copy link
Contributor Author

kdashg commented Oct 18, 2022

I can't figure out how to run linting, which is weird, because I had it at least partially working on a different computer.
What invocations do you use?

@kdashg
Copy link
Contributor Author

kdashg commented Oct 19, 2022

But regardless, PTAL!

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r1, 2 of 3 files at r2, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @kdashg)

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, all tests are passing now.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @kdashg)

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @kdashg)

Copy link
Contributor Author

@kdashg kdashg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @lina128 and @pyu10055)


tfjs-backend-webgl/src/gpgpu_context.ts line 293 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

The setProgram call is missing in the change.

It's still there, it's just further down now.


tfjs-backend-webgl/src/gpgpu_context.ts line 75 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

gl needs to be casted to WebGL2RenderingContext type, otherwise it is failing ts check.

75 () => gl.createVertexArray());
~~~~~~~~~~~~~~~~~
tfjs-backend-webgl/src/gpgpu_context.ts:79:20 - error TS2339: Property 'bindVertexArray' does not exist on type 'WebGLRenderingContext'.

79 () => gl.bindVertexArray(vao as WebGLVertexArrayObject));
~~~~~~~~~~~~~~~
tfjs-backend-webgl/src/gpgpu_context.ts:83:20 - error TS2339: Property 'deleteVertexArray' does not exist on type 'WebGLRenderingContext'.

83 () => gl.deleteVertexArray(vao as WebGLVertexArrayObject));
~~~~~~~~~~~~~~~~~
tfjs-backend-webgl/src/gpgpu_context.ts:87:36 - error TS2339: Property 'VERTEX_ARRAY_BINDING' does not exist on type 'WebGLRenderingContext'.

87 () => gl.getParameter(gl.VERTEX_ARRAY_BINDING));
~~~~~~~~~~~~~~~~~~~~

Weird, typechecking worked for me locally! Generally it should be inferring that gl is constrained to WebGL2RenderingContext from the instanceof branch!


tfjs-backend-webgl/src/gpgpu_context.ts line 344 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

this is a blocking call, which would likely break our parallel compilation code. @lina128

This isn't a blocking call for Firefox, at least! (it doesn't interact with program stuff)


tfjs-backend-webgl/src/gpgpu_context.ts line 359 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

why you need to rebind the old vertex array? The setProgram seems to bind the new vertex immediately?

Yeah, it looks like that's probably true in the final form of the patch here.

I did it because it's the safest thing to do correctness-wise. If anything else in the library or in an app expects the bound VertexArray to remain unchanged, neglecting to reset this to its previous value would cause breakage.
(I believe this is exactly the class of errors we're running into in the case I'm investigating: https://bugzilla.mozilla.org/show_bug.cgi?id=1793986 )

It's kinda guessable that createProgram might change GL's current program, but it's not at all expected that it would change the VAO.

An alternative would be to lazily create the VAO in executeProgram.

I'm not familiar with this codebase, and I can't figure out how to build and test it locally. (I can get it to do lint checks, but not integrate with the testcase I need to investigate)
As such, I'm being very cautious and paranoid about correctness. :)

@kdashg
Copy link
Contributor Author

kdashg commented Oct 20, 2022

Sorry, that comment was stuck in the queue somehow.

kdashg and others added 5 commits October 20, 2022 10:17
This prevents issues where vertex attrib layouts don't match between programs (which is possible even with the same vertex shader!), or where someone else overwrites vertexAttribPointer between createProgram and executeProgram.
Workaround typechecker instanceof constraint into lambda.
Don't save-and-restore VAO since we overwrite it anyway.
Bind index buffer to VAOs.
@kdashg
Copy link
Contributor Author

kdashg commented Oct 26, 2022

@pyu10055 Heya, do you need anything else from me here?

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR is ready, thank you for the fixes.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @lina128)

@pyu10055 pyu10055 merged commit e9a9d4f into tensorflow:master Oct 26, 2022
@shanumante-sc
Copy link

This change is generating an Assertion failed: gpgpu_util.bindVertexProgramAttributeStreams not fully successful. when running fill. This is because the uv attribute is being compiled out by the shader (reproduces on Mali GPU on Pixel 7 Pro) and so bindVertexProgramAttributeStreams returns false.

The corresponding vertex and fragment sources are

VertexShaderSource
#version 300 es
    precision highp float;
    in vec3 clipSpacePos;
    in vec2 uv;
    out vec2 resultUV;

    void main() {
      gl_Position = vec4(clipSpacePos, 1);
      resultUV = uv;
    }
FragmentSource
#version 300 es
    precision highp float;
    precision highp int;
    precision highp sampler2D;
    in vec2 resultUV;
    out vec4 outputColor;
    const vec2 halfCR = vec2(0.5, 0.5);

    struct ivec5
    {
      int x;
      int y;
      int z;
      int w;
      int u;
    };

    struct ivec6
    {
      int x;
      int y;
      int z;
      int w;
      int u;
      int v;
    };

    uniform float NAN;
    
    
    
      #define round(value) newRound(value)
      int newRound(float value) {
        return int(floor(value + 0.5));
      }

      ivec4 newRound(vec4 value) {
        return ivec4(floor(value + vec4(0.5)));
      }
    

    int imod(int x, int y) {
      return x - y * (x / y);
    }

    int idiv(int a, int b, float sign) {
      int res = a / b;
      int mod = imod(a, b);
      if (sign < 0. && mod != 0) {
        res -= 1;
      }
      return res;
    }

    //Based on the work of Dave Hoskins
    //https://www.shadertoy.com/view/4djSRW
    #define HASHSCALE1 443.8975
    float random(float seed){
      vec2 p = resultUV * seed;
      vec3 p3  = fract(vec3(p.xyx) * HASHSCALE1);
      p3 += dot(p3, p3.yzx + 19.19);
      return fract((p3.x + p3.y) * p3.z);
    }

    
vec2 uvFromFlat(int texNumR, int texNumC, int index) {
  int texR = index / texNumC;
  int texC = index - texR * texNumC;
  return (vec2(texC, texR) + halfCR) / vec2(texNumC, texNumR);
}
vec2 packedUVfrom1D(int texNumR, int texNumC, int index) {
  int texelIndex = index / 2;
  int texR = texelIndex / texNumC;
  int texC = texelIndex - texR * texNumC;
  return (vec2(texC, texR) + halfCR) / vec2(texNumC, texNumR);
}

    
vec2 packedUVfrom2D(int texelsInLogicalRow, int texNumR,
  int texNumC, int row, int col) {
  int texelIndex = (row / 2) * texelsInLogicalRow + (col / 2);
  int texR = texelIndex / texNumC;
  int texC = texelIndex - texR * texNumC;
  return (vec2(texC, texR) + halfCR) / vec2(texNumC, texNumR);
}

    
vec2 packedUVfrom3D(int texNumR, int texNumC,
    int texelsInBatch, int texelsInLogicalRow, int b,
    int row, int col) {
  int index = b * texelsInBatch + (row / 2) * texelsInLogicalRow + (col / 2);
  int texR = index / texNumC;
  int texC = index - texR * texNumC;
  return (vec2(texC, texR) + halfCR) / vec2(texNumC, texNumR);
}

  

    float sampleTexture(sampler2D textureSampler, vec2 uv) {
      return texture(textureSampler, uv).r;
    }
  

    void setOutput(float val) {
      outputColor = vec4(val, 0, 0, 0);
    }
  
uniform float value;

    ivec4 getOutputCoords() {
      ivec2 resTexRC = ivec2(resultUV.yx *
        vec2(1, 64));
      int index = resTexRC.x * 64 + resTexRC.y;
      int r = index / 64; index -= r * 64;int c = index / 64; index -= c * 64;int d = index / 64; int d2 = index - d * 64;
      return ivec4(r, c, d, d2);
    }
  


      void main() {
        // Input can be obtained from uniform value.
        setOutput(value);
      }

Note that fragment shader is just setting outputColor = vec4(val, 0, 0, 0); without trying to sample any texture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants