Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

Dynamically compile shaders with uniforms instead of attributes for DDS #9185

Merged
merged 6 commits into from
Jun 13, 2017

Conversation

kkaefer
Copy link
Member

@kkaefer kkaefer commented Jun 5, 2017

In 141e995, we changed the way how most values are passed to the shader from uniforms to attributes. We did this, because for DDS, we need the ability to specify a different value for every feature, e.g. to have data-driven color values. Previously, once draw call used a uniform (constant) to give all features in a draw call the same color. In 141e995, we moved to constant vertex attributes, which allows us to optionally specify vertex attribute buffers for every possible combination of DDS values. Most of the time though, these attributes are set to constants.

Constant vertex attributes are an edge case features, and the OpenGL wiki has this to say about them:

Note: It is not recommended that you use these. The performance characteristics of using fixed attribute data are unknown, and it is not a high-priority case that OpenGL driver developers optimize for. They might be faster than uniforms, or they might not.

Unfortunately, using constant vertex attributes causes a performance regression on some devices. To verify that this is the case, I created an isolated benchmark to find out what the culprit is. It consisted of ~36,000 vertices. The color shader received 6 colors as input and combined them to one color with mix. I varied the way the colors are piped through to the fragment shader in the following three combinations:

  1. uniform vec4 u_colorN directly in the fragment shader: 36 fps
  2. uniform vec4 u_colorN in the vertex shader, with a varying vec4 v_colorN to pass the colors from the vertex to the fragment shader: 30fps
  3. attribute vec4 a_colorN in the vertex shader, with a varying vec4 v_colorN to pass the colors from the vertex to the fragment shader, with constant vertex attributes: 26 fps

These results make it look like this performance regression is a combination of two issues (at least on the iPad mini 2):

  • Passing varyings from vertex to fragment shader costs performance
  • Constant vertex attributes are slower than uniforms

It's worth noting, that we had a very similar issue on WebGL in IE: mapbox/mapbox-gl-js#1714. In JS, we're now using a different implementation that generates shader code at run-time, and uses uniforms when the DDS attribute is constant.

Therefore, I believe we'll have to switch our DDS code to a similar model that JS uses where we dynamically generate shader code at runtime based on whether the stylesheet uses data-driven values for a certain property.

I think the way we could support this is by adding #defines into the shader code that recompiling the shader with certain defines based on whether an attribute uses a data-driven value or not.

  • add #defines in the generated shader code to selectively toggle between attribute and uniform
  • add corresponding uniforms to DataDrivenPaintProperty-derived classes
  • set the uniforms as an alternative to attributes for constant data-driven values
  • use the correct shader with the right uniforms enabled

/cc @jfirebaugh

@kkaefer kkaefer added bug Core The cross-platform C++ core, aka mbgl performance Speed, stability, CPU usage, memory usage, or power usage rendering ⚠️ DO NOT MERGE Work in progress, proof of concept, or on hold labels Jun 5, 2017
@kkaefer kkaefer requested a review from jfirebaugh June 5, 2017 08:57
@kkaefer kkaefer force-pushed the dds-dynamic-shaders branch from 76909c5 to 1db4642 Compare June 5, 2017 08:58
@jfirebaugh jfirebaugh force-pushed the dds-dynamic-shaders branch from 47c249c to 3c2ab62 Compare June 5, 2017 19:56
@jfirebaugh
Copy link
Contributor

     VM SIZE                               FILE SIZE
 ++++++++++++++ GROWING                 ++++++++++++++
  +1.7% +43.5Ki __TEXT,__text           +43.5Ki  +1.7%
   +10% +10.4Ki __TEXT,__cstring        +10.4Ki   +10%
  +1.6% +2.69Ki [None]                  +3.49Ki  +2.0%
  +0.9% +2.58Ki __TEXT,__gcc_except_tab +2.58Ki  +0.9%
  +1.2%    +748 __TEXT,__unwind_info       +748  +1.2%
  +0.1%    +128 __TEXT,__const             +128  +0.1%
  +0.3%     +48 __DATA,__data               +48  +0.3%

 -------------- SHRINKING               --------------
  -0.2%     -10 __TEXT,__stub_helper        -10  -0.2%
  -0.2%      -8 __DATA,__la_symbol_ptr        0  [ = ]
  -0.2%      -6 __TEXT,__stubs               -6  -0.2%

  +1.7% +60.0Ki TOTAL                   +60.8Ki  +1.7%

@jfirebaugh jfirebaugh force-pushed the dds-dynamic-shaders branch from 3c2ab62 to 761a2b0 Compare June 5, 2017 20:31
@jfirebaugh
Copy link
Contributor

jfirebaugh commented Jun 5, 2017

Implementation notes:

  • We still generate attribute locations for all paint attributes, even if they aren't used by the shader (this is explicitly allowed by the OpenGL spec). With more work, we could generate attribute locations for only the paint attributes that are needed. This is what GL JS does. It permits another approach to situations in which we are up against the 8 attribute limit: defer the failure case to runtime, when someone tries to use too many data-driven attributes simultaneously.
  • ConstantAttributeBinding and related code is still present. Its use in PaintPropertyBinders is essentially a no-op, but it's also used in SymbolSizeBinder. Ideally we remove it all.
  • There are no limits (other than the inherent limit of possible permutations of data-driven-or-not properties) on distinct program instances, either in-memory or in the on-disk binary program cache. If you make heavy use of runtime styling or otherwise have a large variance in which properties are data-driven, it could use significant memory or disk space. I haven't tried to quantify this at all.

@jfirebaugh jfirebaugh force-pushed the dds-dynamic-shaders branch from 761a2b0 to ca5115b Compare June 5, 2017 21:00
@jfirebaugh
Copy link
Contributor

Bench results (iPhone 6) before:

2017-06-05 16:02:09.942756-0700 Bench GL[3994:1000717] Result:
2017-06-05 16:02:09.943195-0700 Bench GL[3994:1000717] | paris      | 52.4 fps |
2017-06-05 16:02:09.943295-0700 Bench GL[3994:1000717] | paris2     | 60.0 fps |
2017-06-05 16:02:09.943381-0700 Bench GL[3994:1000717] | alps       | 34.0 fps |
2017-06-05 16:02:09.946679-0700 Bench GL[3994:1000717] | us east    | 59.9 fps |
2017-06-05 16:02:09.946789-0700 Bench GL[3994:1000717] | greater la | 48.2 fps |
2017-06-05 16:02:09.946875-0700 Bench GL[3994:1000717] | sf         | 59.5 fps |
2017-06-05 16:02:09.946956-0700 Bench GL[3994:1000717] | oakland    | 57.8 fps |
2017-06-05 16:02:09.947037-0700 Bench GL[3994:1000717] | germany    | 30.7 fps |
2017-06-05 16:02:09.947157-0700 Bench GL[3994:1000717] Total FPS: 402.4
2017-06-05 16:02:09.947231-0700 Bench GL[3994:1000717] Average FPS: 50.3

After:

2017-06-05 15:52:51.406604-0700 Bench GL[3985:998731] Benchmark completed.
2017-06-05 15:52:51.406674-0700 Bench GL[3985:998731] Result:
2017-06-05 15:52:51.406756-0700 Bench GL[3985:998731] | paris      | 59.9 fps |
2017-06-05 15:52:51.406848-0700 Bench GL[3985:998731] | paris2     | 60.1 fps |
2017-06-05 15:52:51.406932-0700 Bench GL[3985:998731] | alps       | 40.5 fps |
2017-06-05 15:52:51.407014-0700 Bench GL[3985:998731] | us east    | 59.9 fps |
2017-06-05 15:52:51.407095-0700 Bench GL[3985:998731] | greater la | 60.1 fps |
2017-06-05 15:52:51.407177-0700 Bench GL[3985:998731] | sf         | 60.1 fps |
2017-06-05 15:52:51.407257-0700 Bench GL[3985:998731] | oakland    | 60.0 fps |
2017-06-05 15:52:51.407338-0700 Bench GL[3985:998731] | germany    | 39.3 fps |
2017-06-05 15:52:51.407410-0700 Bench GL[3985:998731] Total FPS: 439.8
2017-06-05 15:52:51.407481-0700 Bench GL[3985:998731] Average FPS: 55.0

@kkaefer
Copy link
Member Author

kkaefer commented Jun 6, 2017

Here are my benchmarks on the iPad mini 2:

benchmark before DDS after DDS master dynamic shaders
paris 30.7 fps 16.4 fps 17.8 fps 22.2 fps
paris2 32.3 fps 20.5 fps 19.6 fps 27.3 fps
alps 16.2 fps 7.8 fps 8.8 fps 14.0 fps
us east 40.0 fps 27.3 fps 25.6 fps 33.2 fps
greater la 29.5 fps 17.1 fps 17.9 fps 24.9 fps
sf 43.2 fps 25.3 fps 23.6 fps 33.5 fps
oakland 31.0 fps 19.2 fps 19.9 fps 25.3 fps
germany 18.6 fps 9.0 fps 10.1 fps 16.2 fps

Looking at the shaders, it looks like there are still attributes generated for some of the shaders even though they're #ifdefed out. Maybe calling glBindAttribLocation has an adverse effect?

@kkaefer
Copy link
Member Author

kkaefer commented Jun 6, 2017

ios xcworkspace 2017-06-06 14-19-55

@kkaefer
Copy link
Member Author

kkaefer commented Jun 6, 2017

Compare this with the pre-DDS state of the same program:

bench gl gputrace 2017-06-06 16-10-43

@jfirebaugh jfirebaugh force-pushed the dds-dynamic-shaders branch from ca5115b to 8a341aa Compare June 6, 2017 16:00
@jfirebaugh
Copy link
Contributor

@kkaefer Can you run bench on iPad mini 2 one more time? I rebased on master, including 9dfc2d9, which is a significant per-frame performance speedup.

@kkaefer
Copy link
Member Author

kkaefer commented Jun 6, 2017

Results are pretty much unchanged:

benchmark 8a341aa
paris 21.0 fps
paris2 27.7 fps
alps 12.6 fps
us east 30.0 fps
greater la 20.2 fps
sf 33.7 fps
oakland 23.1 fps
germany 14.5 fps

@kkaefer
Copy link
Member Author

kkaefer commented Jun 13, 2017

Remaining task is to convert the size attributes for symbol programs to this format as well.

@jfirebaugh
Copy link
Contributor

Capturing from chat:

Porting symbol size to dynamic shaders is difficult. It doesn't fit in the existing #pragma system. That's why it wasn't implemented that way to begin with. I think we should merge and port the current changes, and not try to switch this right now.

@ansis:

that sounds good to me. The current plan for working around the 8 attribute limit for pitch-scaling is to pack a_size into a_data
if a_size was changed to a dynamically generated separate attribute (which I guess it should be eventually) we'd need to find another way

I profiled the changes here and found #9257. With that included, FPS is maxed at 60 on my iPhone.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Core The cross-platform C++ core, aka mbgl performance Speed, stability, CPU usage, memory usage, or power usage rendering
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants