Implement texture mipmap generation via compute pipelines #5757

frenzibyte · 2023-04-26T13:38:01Z

Depends on Fix D3D11 unbinding SRV/UAV resources with different mipmap ranges veldrid/veldrid#506
Depends on Choose higher D3D11 shader profiles on feature level 11_0+ veldrid#15
Should (resolve) High memory usage causing lag spikes on OpenGL (2023.4.19, Linux) osu#23252
Should (resolve) Texture mipmap generation doesn't work correctly on Vulkan #5748

Preface

Since #5508, custom mipmap generation was implemented to reduce the overhead in the case of uploading small textures to an atlas.

The implementation revolved around binding one level of a texture as a sampler and another level as a render target (FBO), and using texture to downsample the larger texture and store the result in the smaller texture.

This was pretty straightforward for OpenGL and Metal, but D3D11 and Vulkan weren't very supportive. Both of them don't seem to like the idea of binding the same texture as both an input and output of the shader.

That was worked around by allocating another texture to bind as the input for the shader, but that required preparing the texture content for each level, which means switching between encoders on Metal (and Veldrid's Vulkan implementation doesn't handle that well?).

Compute-based mipmap generation

After exploring through compute shaders for the past week, I've came up with a simple implementation that avoids creating extra resources as much as possible.

The concept of it boils down to a compute shader accepting an input texture, a linear sampler, and a uniform buffer supplying the uploaded region. The compute shader downsamples from the texture according to the specified uploaded region and the current invocation ID, and stores the result in the output texture.

For Veldrid, this is implemented by splitting each texture into texture views in VeldridTextureResources, bind one view for sampling and another for output, and supply a re-usable uniform buffer with region data.

For Legacy OpenGL, this is implemented similarly, but texture views aren't required as each texture resembles a sampler by itself, so the properties can be modified accordingly for each dispatch call without creating extra resources.

Compute shaders is shown to be supported by 80% of systems according to this game, but the rendering-based path still exists as a fall back for older hardware since it still behaves 10x better than driver implementation. It can be further improved by using texture views and caching VBOs, but I'm leaving it as-is in this PR.

Better rectangle merging implementation

After writing a test scene for mipmap generation, it turns out there was a high overhead coming from the rectangle merging logic added in #5508.

The loop works well for a few overlapping regions, but can get very complex when queuing a high number of uploads at different regions.

The overhead can get as high as ~500ms, which is 100x higher than the overhead of mipmap generation itself.

After discussing this on discord with @smoogipoo, I've came up with a different implementation that follows the nature of texture atlases and produces as less rectangles as possible to simplify the mipmap generation process.

The implementation accepts all uploaded regions, and produces one or two rectangles, one covering all uploaded regions that are horizontally after the top-left uploaded region, and another rectangle covering all uploaded regions that are horizontally before the top-left uploaded region.

Before	After

(each highlighted box represents a region that's worked on by the mipmap generation logic)

On master, mipmaps would be generated for every single uploaded data (including the paddings around the texture) as can be seen above, this resulted in too many vertices and unnecessary fills.

With the new implementation, all uploaded regions are formed into two rectangles to dispatch as many threads as possible without overlapping / unnecessary work.

The new implementation is also ~75x faster than the original logic, which should reduce stutters in osu! when starting gameplay, as there are a lot of textures getting added to the atlas at once.

As an aside, I have noticed both the computing and rendering implementations missing certain textures when testing with osu!, I'll leave this PR blocked from merge for now until I have the time to investigate it or someone else can look into it.

Requires testing on all configurations below:

Optional configurations:

Windows (Vulkan)
Linux (Vulkan)

Direct3D 11 doesn't like reading from "image2D"-like types with complicated formats like rgba8.

We want compatibility with 10_0 feature levels, which use Shader Model 4 (cs_4_0). Shader Model 4 only supports UAV resources of type `RWByteAddressBuffer` or `RWStructuredBuffer` (i.e. structured buffers).

The overhead required to make this work on older Direct3D versions is quite bad compared to using the framebuffer method or driver implementation.

…rsions

Metal mostly support this (needs testing on multiple devices for confirmation), and D3D11 also supports this according to documentations (cs_5_0 profile only).

It's incorrect for texture dimensions that are not a multiple of the threadgroup size. Using texture width is more accurate.

frenzibyte added 30 commits April 20, 2023 09:45

Add basic mipmap generation compute shader

2232cfa

Separate renderer setup methods for post-initialisation resources

94c7599

Use compute shader to generate mipmaps

9f933ef

Fix glGetProgram checking against wrong name

16715ce

Fix compute-based generate mipmaps not actually binding the shader

4c6627a

Use OpenGL namespace for non-embedded platforms

9982423

Add memory barriers against texture/image fetches for synchronisation

b5b8fcf

Restore previous shader state if one was bound

45d9981

Fix wrong for loop conditions

ff18add

Remove unused using directives and disable inspection

34dbc7c

Update mipmap shader to use regular texture type for reading

e0f0f5e

Direct3D 11 doesn't like reading from "image2D"-like types with complicated formats like rgba8.

Remove unnecessary sampler resource

13be7b0

Use structured buffer for image2Ds to respect D3D11 Shader Model 4

e85a7c3

We want compatibility with 10_0 feature levels, which use Shader Model 4 (cs_4_0). Shader Model 4 only supports UAV resources of type `RWByteAddressBuffer` or `RWStructuredBuffer` (i.e. structured buffers).

Simplify mipmap compute shader using bilinear sampler

c476f92

Simplify parameters buffer and only use for D3D11

52d5f93

Emphasise on matching threadgroup sizes between pipeline and shader

62681b2

Update legacy renderer inline with shader changes

b173987

Add visual & benchmark tests for texture mipmaps

b06615d

Improve test scene and fix profiling

fed2aae

Ignore test on headless runs

26c3e92

Remove unused resolved property

e17f927

Only use compute shader on non-ancient Direct3D hardware

aeefa45

The overhead required to make this work on older Direct3D versions is quite bad compared to using the framebuffer method or driver implementation.

Support partial mipmap generation for better performance

4176ff2

Fix OpenGL not binding texture before modifying LOD parameters

f020adb

Centralise compute shader support conditions and handle old OpenGL ve…

bca0a9b

…rsions

Increase thread group size to 32x32

1d8b8f3

Metal mostly support this (needs testing on multiple devices for confirmation), and D3D11 also supports this according to documentations (cs_5_0 profile only).

Cleanup regions storage in mipmap generation queue

bf8d3b8

Move parameters struct to separate file

b06b0c7

Clarify GL compile exceptions

e72432e

Minor cleanup on test scene

3711eb8

frenzibyte added 4 commits April 26, 2023 13:07

Separate ternary condition to new lines

164d92d

Add xmldocs

5524f68

Remove comment on using gl_NumWorkGroups

141db51

It's incorrect for texture dimensions that are not a multiple of the threadgroup size. Using texture width is more accurate.

Trim left rectangle from overlapping with right rectangle

7dfd0c4

frenzibyte added type:performance blocked area:renderer-veldrid area:renderer-gl labels Apr 26, 2023

frenzibyte requested a review from a team April 26, 2023 13:38

frenzibyte self-assigned this Apr 26, 2023

pull-request-size bot added the size/XXL label Apr 26, 2023

frenzibyte added 2 commits April 26, 2023 16:42

Fix Veldrid potentially creating out-of-bounds texture views

bf639e4

Merge branch 'master' into compute-mipmap-generation

b32aa63

bdach mentioned this pull request Apr 29, 2023

Disable manual mipmap generation (and decrease atlas size again) #5765

Merged

smoogipoo mentioned this pull request Jun 10, 2023

Add disk cache for shader compilations #5829

Merged

frenzibyte added 3 commits July 28, 2023 00:17

Merge branch 'master' into compute-mipmap-generation

ce2ef67

Fix post-merge conflict issues

ca52b00

Update GL compute shader flow

86f0c84

peppy mentioned this pull request Aug 20, 2023

Ensure the majority of users get good performance ppy/osu#18892

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement texture mipmap generation via compute pipelines #5757

Implement texture mipmap generation via compute pipelines #5757

frenzibyte commented Apr 26, 2023 •

edited

Loading

Implement texture mipmap generation via compute pipelines #5757

Are you sure you want to change the base?

Implement texture mipmap generation via compute pipelines #5757

Conversation

frenzibyte commented Apr 26, 2023 • edited Loading

Preface

Compute-based mipmap generation

Better rectangle merging implementation

frenzibyte commented Apr 26, 2023 •

edited

Loading