Speed up line rendering with WebGL2 instancing #6162

davepagurek · 2023-05-26T21:31:19Z

Resolves #6091

Changes

Currently, drawing strokes in WebGL mode is fairly slow. Since each segment/cap/join of a stroke is made of at least 2 triangles, we had to duplicate data per vertex of each triangle on the CPU.

Now that WebGL2 is enabled by default, we can use instanced rendering, which lets us repeat attributes per instance being rendered. This lets the GPU do the copying for us.

Here's a stress test example: https://editor.p5js.org/davepagurek/sketches/cWr5RmNAK

If you toggle between the 1.6.0 CDN <script> tag to the uploaded p5.min.js, the frame rate goes up from ~40fps to ~60fps on Chrome my M1 Macbook Pro (closer to ~18fps and ~40fps in Firefox.)

If you toggle version: 2 to version: 1, it goes back to copying the vertex data on the CPU and the frame rate drops to ~35fps. This is slower than before since the format required for instancing is a bit less efficient when done on the CPU, but WebGL2 support is pretty broad now so this should only happen on legacy code.

Future work

The new bottleneck in line rendering seems to be libtess. There are a few things we might consider doing:

Update how curveDetail works to maybe be relative to curve length (or curvature) to save vertices
Move away from libtess, maybe to mapbox/earcut which apparently is faster, but would require extra work to tesselate shapes that aren't on the XY plane

PR Checklist

npm run lint passes
[Inline documentation] is included / updated
[Unit tests] are included / updated

aferriss · 2023-05-30T17:57:44Z

src/webgl/p5.RenderBuffer.js

@@ -8,6 +8,28 @@ p5.RenderBuffer = class {
    this.attr = attr; // the name of the vertex attribute
    this._renderer = renderer;
    this.map = map; // optional, a transformation function to apply to src
+    this._namespace = undefined;


Can you leave some short comments here about the purpose of these properties. I know they're also defined in the setters down below but I find it helpful to view it all in one place.

aferriss · 2023-05-30T18:00:22Z

src/webgl/p5.RenderBuffer.js

@@ -27,45 +49,77 @@ p5.RenderBuffer = class {
      model = geometry;
    }

+    let geometryData = geometry;
+    if (this._namespace) {


Is there some redundancy of if statements and loops here? Any reason to not just combine into one if and one for

combining them should be fine!

aferriss · 2023-05-30T18:12:46Z

src/webgl/p5.RenderBuffer.js

+    this._offset = 0;
+    this._divisor = undefined;
+  }
+  namespace(namespace) {


Why wouldn't we just add these properties to the constructor instead of setting them through chaining functions during construction?

Mostly because it gets hard to read with a long chain of parameters, and because it's valid to have some of these without others and having undefined in the middle of the list feels weird. I could also refactor this to use something like an options object for the constructor, which could serve a similar purpose?

Yeah, I think an options object sounds like it would be much cleaner here if you're open to doing that refactor.

aferriss · 2023-05-30T18:25:40Z

src/webgl/p5.Shader.js

+        if (divisor !== undefined) {
+          this._renderer.GL.vertexAttribDivisor(loc, divisor);
+        } else if (this._renderer.webglVersion === constants.WEBGL2) {
+          this._renderer.GL.vertexAttribDivisor(loc, 0);


Not sure I understand what this divisor is doing, though I assume it's ok to have it be zero?

The divisor is how many instances get the same attribute value in a row before moving on to the next value in the buffer. The default behavior is the same as 0, which means don't repeat anything, just keep pulling new values from the buffer like normal. For attributes that we want to repeat, we call set a different divisor to reuse the value for the whole instance or for multiple instances.

I can also add something like this in a comment explaining what it is so that it's less necessary to have a bunch of MDN tabs open to understand what the code is trying to do 😅

That makes sense to me, thanks for the explanation!

aferriss · 2023-05-30T18:33:00Z

src/webgl/shaders/line.vert

-  vec4 posp = uModelViewMatrix * aPosition;
-  vec4 posqIn = uModelViewMatrix * (aPosition + vec4(aTangentIn, 0));
-  vec4 posqOut = uModelViewMatrix * (aPosition + vec4(aTangentOut, 0));
+  vec4 position = abs(aSide) == 1. ? aFromPosition : aToPosition;


Is there any way we can avoid all this branching? Maybe some clever math or predetermine the from / to position for the variable before reaching the shader?

Or is this just needless optimization?

I'm not sure if I can think of a way to get from/to working before reaching the shader. I was hoping there would be a way to repeat attributes for a few vertices in an instance to use just one position attribute, but it looks like all we can do is repeat the attribute for every vertex in the instance.

I think I can make that happen with mix() to avoid the conditional! Not sure if that'll be faster or not but I can check and see.

aferriss

Thanks @davepagurek This is really complicated stuff and seems like it could be a big win for perf! I do worry a little about regressing performance on webGL1 though. Any way to prevent that?

I think another good test would be to make sure that lines drawn in webGL1 match those in webGL2

aferriss · 2023-05-30T19:26:57Z

On the demo you linked, I'm actually seeing better performance from the CDN p5, using both webGL1 and webGL2. Your version was giving around 15fps, the CDN was close to 60 when drawing the different shapes.

I also got an error when trying to use your version with webGL1, not sure if maybe that script got built before all of your changes were made, but just want to flag that.

davepagurek · 2023-05-30T19:45:27Z

On the demo you linked, I'm actually seeing better performance from the CDN p5, using both webGL1 and webGL2. Your version was giving around 15fps, the CDN was close to 60 when drawing the different shapes.

I've noticed sometimes changes to the html file don't update immediately for me until I save, I wonder if maybe that's happening? Do you mind checking again and comparing this version (instancing, webgl2) with this version (using the CDN, no instancing, only webgl1)? If it's still slower with instancing, that's definitely a problem -- what browser/OS do you use?

I also got an error when trying to use your version with webGL1, not sure if maybe that script got built before all of your changes were made, but just want to flag that.

Weird, I don't get that on Chrome/Firefox on my Mac, mind sending me the stacktrace of the error?

davepagurek · 2023-05-30T19:49:46Z

I do worry a little about regressing performance on webGL1 though. Any way to prevent that?

This data format requires a tad more copying of values than before now that there are two position attributes instead of one. One option is to use a different line shader with different attributes in WebGL1, but I was hoping to avoid that because the maintenance cost goes up. I can see if there's anything I can do to speed up the copying in the WebGL1 case though. I can do a bit more debugging and see if something like using a fixed sized array upfront can help (since we can calculate the buffer sizes needed ahead of time before sorting the data into the non-instanced format)

aferriss · 2023-05-30T20:32:02Z

Weird, I don't get that on Chrome/Firefox on my Mac, mind sending me the stacktrace of the error?

The error is TypeError: r._makeStrokeBufferData is not a function at undefined:2:860144 and only occurs when drawing 3d geometry with calls like sphere or box.

I'm using chrome 113 on Mac OSX 13.13

I'm reliably seeing very low framerates when testing with just lines using your build, and around 20fps using the CDN build. I made a copy with some average framerate calculations here as well.
https://editor.p5js.org/aferriss/sketches/HP0KpYo1J

davepagurek · 2023-05-30T21:54:43Z

Thanks! ok, found a fix for the crash for WebGL1 mode. Eventually I'll add some more testing for WebGL1 mode for retained and immediate mode, but it sounds like before getting to that there's some more fundamental performance things to work out.

Also thanks for the p5 editor link, I think the difference in my testing was on single shapes with lots of lines vs many shapes with single lines. It seems like when rendering shapes with a lot of data (the single large begin/endShape in my editor link), the bottleneck is in duplicating the data, which this change addresses, but when rendering a single line, the fact that there are more buffers involved means that the overhead of flattening multiple arrays and allocating more Float32Arrays adds up. I'm going to spend some more time seeing if there's a way to keep the performance improvements for large shapes without compromising drawing many lines so much.

davepagurek · 2023-06-07T23:44:20Z

OK I'm back with some updated! I have a new version here for testing: https://editor.p5js.org/davepagurek/sketches/tcug_H1Z8

Updates I made to speed things up:

No longer checking if face culling is enabled when drawing strokes. To make strokes work regardless of whether or not face culling is enabled, I updated the winding order of lines to be consistent with faces.
Using the same strides/divisors for all stroke buffers. This seems to be a bit faster than changing it up between segments/caps/joins (maybe the WebGL driver internally does some deduplication of commands?)
Avoiding flattening data when putting it into RenderBuffers. This is done by just writing them directly into an array rather than keeping around subarrays, and since the data is never read by other code (just sent right to the GPU), there is little cost in code readability.
Storing stroke data directly in Float32Arrays. I made a new p5.DataVector class similar to a C++ vector that doubles its capacity when data grows long. This way we avoid allocating new data every frame and having to GC it, which seems to be a big bottleneck.
- As an aside, I also shrink the capacity down to be the nearest power of 2 before sending the data to the GPU, since it seems that even when you specify a count in gl.bufferData, it sends the whole Flaot32Array. Hopefully the power-of-2 sizes leave enough wiggle room to avoid too many resizes. It's great for drawing the same sort of shape many times in a row though!

Here's how it stands up against 1.6.0 in WebGL1 and 2 modes:

Test	Previous		Updated
	Chrome	Firefox	WebGL 1		WebGL 2
	Chrome	Firefox	Chrome	Firefox	Chrome	Firefox
1000 lines	60fps	16fps	(↓)28fps	(↑)24fps	(=)60fps	(↑)34fps
Large curve	23fps	10fps	(↓)12fps	(=)11fps	(↑)39ps	(↑)25fps
Retained+Immediate Shapes	60fps	60fps	(=)60fps	(=)60fps	(=)60fps	(=)60fps

Comparing against 1.6.0:

In WebGL2 mode, everything is around 2x as fast, or capped out at the 60fps max
In WebGL1 mode, Chrome is always 2x slower, while Firefox is around the same or marginally better

Personally, I think I'm OK with the WebGL1 tradeoffs, as I think its main use cases are:

old browsers, e.g. Safari on my old Mac which I won't update the OS of, but it still has access to up-to-date Chrome and Firefox with WebGL2 support anyway
shader compatibility with old shaders that used extensions (which could be used in WebGL2 by default but would require updating the GLSL syntax), but shader-based sketches seem not to rely much on geometry and lines

If you're OK with the performance tradeoffs for WebGL1, I can continue on and make another update with the extra comments/refactoring/tests we talked about earlier.

aferriss · 2023-06-08T00:54:19Z

Thanks for the updates @davepagurek ! I'll take some time to look through your new commits and test perf again. For my own clarity, in your table does the previous column indicate p5 perf prior to any of your changes, or do they represent the perf from your initial round of commits? If it's the latter, including perf from prior to making any of these changes would be helpful to see.

It looks like there are wins everywhere except for drawing in chrome using webGL1, and a more than 2x slow down still seems like a large regression to me. Do you have a sense of why it's such a substantial hit for that platform?

It'd be really great if we could test on some mobile devices, safari, and some windows machines as well if possible.

davepagurek · 2023-06-08T02:57:30Z

The "previous" column is for v1.6.0.

The slowdown is mostly because the data layout is a little different (there's a to and from coordinate instead of just a single vertex position) because it's optimizing for the webgl2 case. There's a tradeoff to make between webgl1 speed and code maintenance, because we could use a separate shader and data storage format if we're using webgl1, but then we have to keep track of the extra code. This implementation uses the same data layout for both, and has an extra for loop to mimic on the cpu what webgl2 instanced rendering does on the gpu, opting to compromise speed to simplify solve already kind of complex code.

I think it is definitely possible to use ifdefs to use a different shader layout and use different buffer structure in webgl1 mode if we want to treat it less like a graceful degradation and more like a primary usage target if we're ok with the added complexity and maintenance.

aferriss · 2023-06-08T03:46:08Z

Thanks for the extra explanation. It sounds to me like the extra complexity may not be worth the amount of devices we'd be continuing to support. It's a little unclear how many samples they have but data from here seems to suggest that almost 98% of devices are able to support webGL2. Might be good to get some additional input from some of the other maintainers as well.

davepagurek · 2023-06-08T12:12:46Z

Also, I got someone to test on windows for me, this is what they reported back:

AMD Ryzen 9 5950X (16-Core/3.40 GHz), AMD Radeon RX 6600 - all modes using default settings as well.

chrome (114.0.5735.110)
many lines - 11-13 fps (1.6.0) / 5 fps (p5.js in sketch)
large curve - 21-22 fps / 45-46 fps
retained shapes - 57 fps / 60 fps

firefox (114.0)
many lines - 3 fps / 5-6 fps
large curve - 14 fps / 33-36 fps
retained shapes - 60 fps / 60 fps

It looks like the many lines one is still slower in Chrome, and actually quite a bit slower than on my Mac even on the previous version of p5. Since a single large curve runs reasonably fast, it seems like the bottleneck there is likely the data transfer between CPU and GPU.

I'll also try to run some tests on Safari and my phone later!

davepagurek · 2023-06-12T23:53:39Z

I did some more testing on Safari on my mac and Chrome on my Android phone:

Test	Previous		Updated
	Safari	Mobile Chrome	WebGL1		WebGL2
	Safari	Mobile Chrome	Safari	Mobile Chrome	Safari	Mobile Chrome
1000 lines	26fps	6fps	18fps	4fps	10fps	2fps
Large curve	20fps	9fps	15fps	4fps	41fps	18fps
Retained+Immediate Shapes	60fps	60fps	60fps	60fps	60fps	60fps

It seems like for both of these, rendering many lines gets a bit slower, and rendering large shapes gets faster. Interestingly, safari is actually faster in WebGL 1 mode than WebGL 2.

It seems like there isn't much of a pattern. I suppose it depends on what pipelines the hardware is optimized for under the hood? Mobile seems to be extremely slow at large volumes of back-and-forth between the CPU and GPU, so rendering that many lines is never fast.

So it looks like this instancing update is not a clear win across the board, but rather, this will be a question of what we want to optimize for. I think I'm slightly more inclined to optimize for larger shapes rather than many lines:

In 2D mode people often use many small lines because:
- There isn't a way of doing line gradients. In WebGL mode, we can do line gradients, so that should be the recommended approach
- They want to emulate textured brushes with many small strokes. This is also valid, but probably better served by a shader, although that's a pretty big skill jump to make. This is an area I'd like to make more accessible going forward regardless, though
In 2D mode, curves are relatively cheap. In WebGL, they're significantly more expensive than straight lines, and drawing a few curves can quickly land you in something that performs as well as the "large curve" test case

...but also we don't need to necessarily optimize for that using the method in this PR (although I don't have other ideas off the top of my head just yet.)

What are your thoughts @aferriss? Also @stalgiag if you have any thoughts that would also be great!

davepagurek · 2023-07-31T12:26:09Z

I think I'm going to close this one in favour of the more general performance improvements in #6230, and also the more manual approach to instancing that #6276 starts to add.

davepagurek added 3 commits May 26, 2023 15:24

Handle line instancing in WebGL2

dd6a36b

Add WebGL1 backwards compatibility

4823dd5

Rename variables in line vertex shader

7191571

davepagurek requested a review from aferriss May 26, 2023 21:31

davepagurek marked this pull request as draft May 26, 2023 22:16

davepagurek changed the title ~~Speed up line rendering with WebGL2 instancing~~ [WIP] Speed up line rendering with WebGL2 instancing May 26, 2023

davepagurek added 2 commits May 26, 2023 18:25

Update tests

be098db

Fix indices for WebGL1 mode

949c665

davepagurek changed the title ~~[WIP] Speed up line rendering with WebGL2 instancing~~ Speed up line rendering with WebGL2 instancing May 26, 2023

davepagurek marked this pull request as ready for review May 26, 2023 22:48

aferriss reviewed May 30, 2023

View reviewed changes

aferriss suggested changes May 30, 2023

View reviewed changes

Fix crash in Webgl1 mode on retained geometry

464aeec

davepagurek added 4 commits June 5, 2023 19:09

Avoid flattening buffers

0d71cac

Get it working with faster data formats and less data reading

04672ca

Rescale data vectors

3012033

Merge branch 'main' into feat/instanced-lines

69c98f0

davepagurek mentioned this pull request Jun 21, 2023

Improve performance of line rendering #6230

Merged

3 tasks

davepagurek closed this Jul 31, 2023

This was referenced Sep 15, 2023

WebGL Slower than earlier versions - is that normal? #6407

Closed

Improve WebGL2 line performance using drawArraysInstanced() #6091

Closed

davepagurek mentioned this pull request Jun 20, 2024

Support for .mtl Files with Textures #7072

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up line rendering with WebGL2 instancing #6162

Speed up line rendering with WebGL2 instancing #6162

davepagurek commented May 26, 2023 •

edited

Loading

aferriss May 30, 2023

aferriss May 30, 2023

davepagurek May 30, 2023

aferriss May 30, 2023

davepagurek May 30, 2023

aferriss May 30, 2023

aferriss May 30, 2023

davepagurek May 30, 2023

aferriss May 30, 2023

aferriss May 30, 2023

davepagurek May 30, 2023

aferriss left a comment •

edited

Loading

aferriss commented May 30, 2023

davepagurek commented May 30, 2023

davepagurek commented May 30, 2023

aferriss commented May 30, 2023

davepagurek commented May 30, 2023

davepagurek commented Jun 7, 2023

aferriss commented Jun 8, 2023

davepagurek commented Jun 8, 2023

aferriss commented Jun 8, 2023

davepagurek commented Jun 8, 2023 •

edited

Loading

davepagurek commented Jun 12, 2023

davepagurek commented Jul 31, 2023

Speed up line rendering with WebGL2 instancing #6162

Speed up line rendering with WebGL2 instancing #6162

Conversation

davepagurek commented May 26, 2023 • edited Loading

Changes

Future work

PR Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aferriss left a comment • edited Loading

Choose a reason for hiding this comment

aferriss commented May 30, 2023

davepagurek commented May 30, 2023

davepagurek commented May 30, 2023

aferriss commented May 30, 2023

davepagurek commented May 30, 2023

davepagurek commented Jun 7, 2023

aferriss commented Jun 8, 2023

davepagurek commented Jun 8, 2023

aferriss commented Jun 8, 2023

davepagurek commented Jun 8, 2023 • edited Loading

davepagurek commented Jun 12, 2023

davepagurek commented Jul 31, 2023

davepagurek commented May 26, 2023 •

edited

Loading

aferriss left a comment •

edited

Loading

davepagurek commented Jun 8, 2023 •

edited

Loading