Skip to content

[WIP] WebGPU Status

Akio Gaule edited this page Nov 13, 2024 · 7 revisions

Introduction:

This document describes the state of the WebGPU implementation in Atom. The document refers only to the rendering functionality, and it doesn't cover any information about compiling or running in a browser environment.

There's currently a partial implementation of a WebGPU RHI in Atom. Most of the main features are already implemented. The main limitations comes from the usage itself, since some features are either not supported by WebGPU or will require major refactoring in order to accommodate a workaround.

Branches

All of O3DE WebGPU changes are in the https://github.com/o3de/o3de/tree/webgpu-rhi branch (there's nothing in the development branch). The current changes in the branch have been "lightly" code reviewed, so a thorough review to all changes has to be done before integrating into the main branch.

AZSLc changes are also required. All of those changes are in the https://github.com/o3de/o3de-azslc/tree/webgpu branch.

Runtime Environment

O3DE doesn't compile or run in a browser, so how are you able to test the WebGPU RHI? You have to remember that WebGPU is just a wrapper around other rendering APIs (it's actually an RHI), and luckily there's open source libraries available that implement the WebGPU functionality on C++. In our case we decided to use Google's implementation, a library called Dawn (https://dawn.googlesource.com/dawn) in order to run in platforms other than the browser. Dawn is actually the same library that is used by Chromium when running WebGPU.

The Dawn library has 2 main parts: the runtime library and an offline shader compiler called Tint. Both components are treated as separated 3rd party libraries.

Disclaimer: All development was done on Windows. Other platforms have not been tested (but in theory they should work).

3rd Party

At this moment, Dawn and Tint have not been added as 3rd party libraries, so local building is needed in order to run WebGPU. To achieve this you first need to download the source code of Dawn from https://dawn.googlesource.com/dawn and follow the instructions (https://dawn.googlesource.com/dawn/+/HEAD/docs/quickstart-cmake.md) for building and installing. After installing, go to the DawnTargets.cmake file and add the GLOBAL keyword when adding the library:

add_library(dawn::dawn_public_config INTERFACE IMPORTED GLOBAL)

add_library(dawn::webgpu_dawn SHARED IMPORTED GLOBAL)

Go to the DawnTargets-release.cmake and add the DEBUG and PROFILE configurations (copy paste from the release one)

Building o3de

Since Dawn is not a 3rd party lib yet, CMake is expecting a folder where the Dawn library is. You can do this by using CMAKE_PREFIX_PATH, something like this:

cmake -B build/windows -S . -G "Visual Studio 17" -DCMAKE_PREFIX_PATH=D:\Dawn\install\Release

Building Shaders

WebGPU only accepts shaders written in WGSL. Atom shaders are written in HLSL, so we will need to translate the shaders into the proper format. In order to get to WGSL, we first need to transform from HLSL to SPIR-V. Luckily we already have DXC for doing this translation (used for Vulkan shaders). Then, we will need to use Tint (an offline compiler that is part of Dawn) in order to translate from SPIR-V to WGSL.

To build Tint, you will need to go back to the Dawn project, and compile the "tint_cmd_tint_cmd" project. A small modification is needed since Tint doesn't support enabling shader features when compiling a shader through command line (https://issues.chromium.org/issues/368289749). Go to the src/tint/lang/spirv/reader/ast_parser/parse.cc file and below "allowed_features.extensions.insert(wgsl::Extension::kChromiumDisableUniformityAnalysis);" enable all extensions:

allowed_features.features.insert(wgsl::LanguageFeature::kPacked4X8IntegerDotProduct);

allowed_features.features.insert(wgsl::LanguageFeature::kPointerCompositeAccess);

allowed_features.features.insert(wgsl::LanguageFeature::kReadonlyAndReadwriteStorageTextures);

allowed_features.features.insert(wgsl::LanguageFeature::kUnrestrictedPointerParameters);

After building the project, there should be a tint.exe along with a webgpu_dawn.dll file. Copy both files into a "path/to/AssetProcessor/Builders/Tint" folder (you will need to create the Tint folder). This is the location that AssetProcessor will look for the Tint compiler (this will be done automatically once Tint is a 3rd party).

WebGPU RHI features

The following classes have been implemented in the RHI:

  • Swapchain
  • Image
  • ImageView
  • ImagePool
  • Aliased Heap (WebGPU doesn't support memory control, so aliasing is not possible. This is just Image and Buffer pools)
  • ShaderPlatformInterface. Compiles from HLSL to SPIR-V using DXC and then from SPIR-V to WGSL using Tint. WGSL is a text format, so there's no precompiled shaders.
  • TransientAttachmentPool
  • CommandQueue
  • CommandQueueContext
  • Device
  • PhysicalDevice
  • FrameGraphExecuter
  • FrameGraphGroups
  • Sampler
  • ShaderStageFunction
  • ShaderModule
  • PipelineLayout
  • Buffer
  • BufferPool
  • BufferPoolResolver
  • BufferView
  • CommandList (draw, dispatch and copy support)
  • PipelineState
  • RenderPipeline
  • ShaderResourceGroupPool
  • ShaderResourceGroup
  • MergedShaderResourceGroupPool
  • MergedShaderResourceGroup
  • Fence
  • StreamingImagePool
  • AsyncUploadQueue
  • ComputePipeline
  • ImagePoolResolver
  • NullDescriptorManager
  • RootConstantManager

WebGPU limitations

Since WebGPU must run on multiple render APIs (DX12, Metal, Vulkan, etc) some concession have been done in favor of portability. The following is some of those limitations that affect Atom the most:

  1. Multithreading rendering is not supported (YET): WebGPU is single threaded in the browser, so no WebGPU calls can be done from different threads. This means all command list must be recorded from the same thread, all texture and buffer uploads must be from the render thread.

Workaround: Disable command list recording, SRG compiling, async upload, queue submission from operating in a different thread. All of these operations must be done from the render thread.

  1. Only one queue is available (YET): WebGPU only has one queue that is in charge of all graphics, compute and copy operations.

Workaround: The graphics, compute and copy queues point into the one queue that is available.

  1. No memory managment: WebGPU doesn't expose any way of controlling the memory that resources use. This means that heaps, or aliased memory is not allowed. When resources are created, the memory is allocated by WebGPU (similar to legacy APIs like OpenGL).

Workaround: None

  1. Shader resource arrays are not supported (YET): This mean that an array of textures (e.g. Texture2D m_textures[5], not to be confused with TextureArray m_textures) are not allowed in the shader. The same goes for Buffers and Samplers.

Workaround: Split the array into separate resources. Texture2D m_textures -> Texture2D m_texture0; Texture2D m_texture1; ... Texture2D m_texture4; When accesing the texture, you will need to access the proper texture resource. This solution requires heavy use of macros.

  1. Non uniform access to a texture is not supported (basically anything than can create divergent waves in the execution). This mean that you cannot have a sampling operation after an "if" statement that depends on something that is not a uniform value. For example something like this is not allowed: if (psInput.m_useColor) { color += m_diffuse.Sample(m_sampler, psInput.m_uv); } but something like this is allowed: if (constantBuffer.m_useColor) { color += m_diffuse.Sample(m_sampler, psInput.m_uv); } (assuming that psInput is the input to the Fragment Shader and constantBuffer is a uniform buffer)

Workaround: Sample all textures BEFORE having the branch and storing their values into temporary variables. This way we avoid wave divergency but we pay the penalty of sampling all the textures even if we don't use them later.

  1. Root constants are not supported (YET): Even though DX12, Vulkan and Metal support root constants (or similar), WebGPU doesn't have them yet.

Workaround: a uniform buffer with the proper values is used instead of root constants. The value of this uniform buffer is updated just before the submit call. WebGPU supports dynamic offsets for accessing a buffer in the SRG. This allows us to change the offset per submit call without having to recompile an SRG. A RootConstantManger was created for sharing an SRG and a buffer between different draw calls.

  1. Buffer and RWBuffer are not supported by Tint (YET): Buffer and RWBuffer shader resources are not recognized by Tint.

Workaround: Use a StructuredBuffer or RWStructuredBuffer instead.

  1. Extra data for shader resources: WebGPU requires extra information on what the resources will be, and also on how they will be used. For example, storage textures must specify the format of the texture in the shader. This forces you to choose the format at shader build time. You also need to specify the sample type when sampling a texture (float, UnfilterableFloat, Depth, Sint, Uint) when building the SRG. Finally, when using a RWTexture for writing only, that must be specified when creating the SRG. Unfortunately HLSL doesn't have write only textures, so extra information is needed in the shader.

Workaround: Multiple attributes were created for collecting metadata of the shader resources when building a shader. This way we have the necessary information when building the SRG at runtime.

  1. Limiting shader resources: This is not technically part of the WebGPU standard, but Dawn has very strict limits in the amount of resources a shader can access per stage. These limits are not based on hardware limitations of any kind, but they are imposed by Dawn in favor of increasing compatibility. There's already a ticket in Dawn to discuss increasing the limits (https://issues.chromium.org/issues/350075283). Some of the limits are:
  • 4 SRGs per stage
  • 16 sampled textures per stage
  • 16 samplers per stage
  • 10 storage buffers per stage
  • 8 storage textures per stage
  • 12 uniform buffers per stage

Workaround: Since there's not a lot space for resources, multiple features have to be disabled in order to accommodate the limiting space. The WebGPU AzslcHeaders.azsli header contains the disabled features

  • #define ENABLE_POLYGON_LTC_LIGHTS 0
  • #define ENABLE_QUAD_LIGHTS 0
  • #define ENABLE_SPHERE_LIGHTS 0
  • define ENABLE_DISK_LIGHTS 0
  • define ENABLE_CAPSULE_LIGHTS 0
  • define ENABLE_ESM_SHADOW 0
  • define ENABLE_FOG_LAYER 0
  • define ENABLE_GOBO 0
  • define ENABLE_DECALS 0
  • define ENABLE_CLEAR_COAT 0
  • define ENABLE_SIMPLE_SPOTLIGHTS 0
  • define ENABLE_SIMPLE_POINTLIGHTS 0
  • define ENABLE_PARALLAX 0
  • define ENABLE_DOF 0
  • define ENABLE_LINEAR_DEPTH 0
  • define ENABLE_LIGHT_CULLING 0
  • define ENABLE_FULLSCREEN_SHADOW 0

For handling the limited number of SRG slots, the merging of SRGs solution is used. This is the same solution that was adopted for Vulkan on Android, where the number of slots for SRG is also limited.

Testing

AtomSamplerViewer

The best way to test WebGPU is using the AtomSamplerViewer project (https://github.com/o3de/o3de-atom-sampleviewer). This project contains multiple RHI samples that test different features. The following samples are working at some capacity:

  1. RHI/Compute

image

  1. RHI/CopyQueue

image

  1. RHI/InputAssembly

image

  1. RHI/MSAA

image

  1. RHI/MultipleViews

image

  1. RHI/MultiRenderTarget

image

  1. RHI/MultiViewportSwapchainComponent

image

  1. RHI/Stencil

image

  1. RHI/Swapchain

image

  1. RHI/Texture

image

  1. RHI/Texture3D

image

  1. RHI/TextureArray

image

  1. RHI/TextureMap

image

  1. RHI/Triangle

image

  1. RHI/TrianglesConstantBuffer

image

Pipeline support

Currently the WebGPU implementation is not able to run any of the included pipelines with o3de. The latest progress was done using a simple pipeline with only the forward pass. The implementation was able to render without validation errors but multiple rendering artifacts were present. The simplest way to create a pipeline is modifying the mobile one and removing everything that is not needed. You can also set the initial pipeline setting the r_renderPipelinePath CVAR.