Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[meta] f16 support #4384

Open
1 of 6 tasks
jinleili opened this issue May 2, 2022 · 10 comments · May be fixed by #5701
Open
1 of 6 tasks

[meta] f16 support #4384

jinleili opened this issue May 2, 2022 · 10 comments · May be fixed by #5701
Labels
area: correctness We're behaving incorrectly area: cts Issues stemming from the WebGPU Conformance Test Suite area: naga back-end Outputs of naga shader conversion area: naga front-end lang: HLSL D3D Shading Language lang: Metal Metal Shading Language lang: SPIR-V Vulkan's Shading Language lang: WGSL WebGPU Shading Language naga Shader Translator type: enhancement New feature or request

Comments

@jinleili
Copy link
Contributor

jinleili commented May 2, 2022

From WGSL spec: https://gpuweb.github.io/gpuweb/wgsl/#floating-point-types

TODO

@teoxoy teoxoy changed the title [wgsl-in]Need enable f16; f16 extension support [wgsl-in] f16 extension support May 2, 2022
@teoxoy
Copy link
Member

teoxoy commented May 2, 2022

Thanks for filing! There is some more info here: gpuweb/gpuweb#2512

I suspect this might not be too hard to implement in the Metal and Vulkan back-ends; however for D3D it will require us to generate DXIL (i.e. requires a new back-end; see #4302).

@teoxoy
Copy link
Member

teoxoy commented May 2, 2022

Actually according to this table SM6.2 and -enable-16bit-types should work (no need for DXIL generation).

@kdashg
Copy link

kdashg commented Jan 10, 2023

This one is interesting to prioritize, because it's in the v1 specs, but we don't need to support it in order to be compliant and usable for our v1 release.

@luizberti
Copy link

Would just like to add some weight to the prioritization of this: I am currently trying to do some machine learning and vector search stuff with WGSL, and this is currently a huge blocker to almost everything I'm trying to do. This is basically the only significant thing keeping the wgpu ecosystem from being used in the ML world, and I believe this
ecosystem could be one of the most portable ways to deploy ML systems due to the nature of WebGPU.

Another big reason for having this is that even though Rust has the half crate, there is currently no way (outside of core::arch madness), to work with f16 in SIMD fashion (portable_simd is hopeless, at least until f16 makes it to the standard library). Being able to "SIMD on the GPU" would immensely alleviate this constraint.

Sorry if this sounds a bit pushy, that's not how I want to come across, just want to emphasize that this really is an important feature, and that this is currently a total blocker for several important applications, especially in ML.

@TimothyOlsson
Copy link

I concur.

While the current "unpack2x16float" function is currently usable, I have encountered some difficulties in using it efficiently in some use cases. This forces me to use less optimal approaches, resulting in slower speed and higher memory usage.

I understand that you may have multiple priorities to balance, but if possible, I would greatly appreciate your consideration in prioritizing the f16

@FL33TW00D
Copy link
Contributor

FL33TW00D commented Apr 23, 2023

Notes from digging into this today:

  • Currently, I can't get the SHADER-F16 extension in Chrome Canary 114.0.5729.0 on M1.
  • We are currently casting all AbstractFloat to F32 during lexing here. This needs to be delayed until later (during parsing or lowering?). Would be good to find in the Tint source where this is done.
  • Currently unclear in the spec how we are supposed to implement type casting inside the arguments list of a vector or matrix.
  • CTS for F16 isn't currently implemented as per this issue

Would love to get more involved in shipping this feature.

@teoxoy
Copy link
Member

teoxoy commented Apr 24, 2023

  • We are currently casting all AbstractFloat to F32 during lexing here. This needs to be delayed until later (during parsing or lowering?).

After we are done implementing support for const-expressions, we should be able to propagate abstract types when we do the evaluation of const-epressions.

github-merge-queue bot referenced this issue in bevyengine/bevy Jun 18, 2023
![image](https://github.com/bevyengine/bevy/assets/47158642/dbb62645-f639-4f2b-b84b-26fd915c186d)

# Objective

- Add Screen space ambient occlusion (SSAO). SSAO approximates
small-scale, local occlusion of _indirect_ diffuse light between
objects. SSAO does not apply to direct lighting, such as point or
directional lights.
- This darkens creases, e.g. on staircases, and gives nice contact
shadows where objects meet, giving entities a more "grounded" feel.
- Closes #3632.

## Solution

- Implement the GTAO algorithm.
-
https://www.activision.com/cdn/research/Practical_Real_Time_Strategies_for_Accurate_Indirect_Occlusion_NEW%20VERSION_COLOR.pdf
-
https://blog.selfshadow.com/publications/s2016-shading-course/activision/s2016_pbs_activision_occlusion.pdf
- Source code heavily based on [Intel's
XeGTAO](https://github.com/GameTechDev/XeGTAO/blob/0d177ce06bfa642f64d8af4de1197ad1bcb862d4/Source/Rendering/Shaders/XeGTAO.hlsli).
- Add an SSAO bevy example.

## Algorithm Overview
* Run a depth and normal prepass
* Create downscaled mips of the depth texture (preprocess_depths pass)
* GTAO pass - for each pixel, take several random samples from the
depth+normal buffers, reconstruct world position, raytrace in screen
space to estimate occlusion. Rather then doing completely random samples
on a hemisphere, you choose random _slices_ of the hemisphere, and then
can analytically compute the full occlusion of that slice. Also compute
edges based on depth differences here.
* Spatial denoise pass - bilateral blur, using edge detection to not
blur over edges. This is the final SSAO result.
* Main pass - if SSAO exists, sample the SSAO texture, and set occlusion
to be the minimum of ssao/material occlusion. This then feeds into the
rest of the PBR shader as normal.

---

## Future Improvements
- Maybe remove the low quality preset for now (too noisy)
- WebGPU fallback (see below)
- Faster depth->world position (see reverted code)
- Bent normals 
- Try interleaved gradient noise or spatiotemporal blue noise
- Replace the spatial denoiser with a combined spatial+temporal denoiser
- Render at half resolution and use a bilateral upsample
- Better multibounce approximation
(https://drive.google.com/file/d/1SyagcEVplIm2KkRD3WQYSO9O0Iyi1hfy/view)

## Far-Future Performance Improvements
- F16 math (missing naga-wgsl support
https://github.com/gfx-rs/naga/issues/1884)
- Faster coordinate space conversion for normals
- Faster depth mipchain creation
(https://github.com/GPUOpen-Effects/FidelityFX-SPD) (wgpu/naga does not
currently support subgroup ops)
- Deinterleaved SSAO for better cache efficiency
(https://developer.nvidia.com/sites/default/files/akamai/gameworks/samples/DeinterleavedTexturing.pdf)

## Other Interesting Papers
- Visibility bitmask
(https://link.springer.com/article/10.1007/s00371-022-02703-y,
https://cdrinmatane.github.io/posts/cgspotlight-slides/)
- Screen space diffuse lighting
(https://github.com/Patapom/GodComplex/blob/master/Tests/TestHBIL/2018%20Mayaux%20-%20Horizon-Based%20Indirect%20Lighting%20(HBIL).pdf)

## Platform Support
* SSAO currently does not work on DirectX12 due to issues with wgpu and
naga:
  * gfx-rs/wgpu#3798
  * gfx-rs/naga#2353
* SSAO currently does not work on WebGPU because r16float is not a valid
storage texture format
https://gpuweb.github.io/gpuweb/wgsl/#storage-texel-formats. We can fix
this with a fallback to r32float.

---

## Changelog

- Added ScreenSpaceAmbientOcclusionSettings,
ScreenSpaceAmbientOcclusionQualityLevel, and
ScreenSpaceAmbientOcclusionBundle

---------

Co-authored-by: IceSentry <[email protected]>
Co-authored-by: IceSentry <[email protected]>
Co-authored-by: Daniel Chia <[email protected]>
Co-authored-by: Elabajaba <[email protected]>
Co-authored-by: Robert Swain <[email protected]>
Co-authored-by: robtfm <[email protected]>
Co-authored-by: Brandon Dyer <[email protected]>
Co-authored-by: Edgar Geier <[email protected]>
Co-authored-by: Nicola Papale <[email protected]>
Co-authored-by: Carter Anderson <[email protected]>
NoahShomette referenced this issue in NoahShomette/bevy Jun 19, 2023
![image](https://github.com/bevyengine/bevy/assets/47158642/dbb62645-f639-4f2b-b84b-26fd915c186d)

# Objective

- Add Screen space ambient occlusion (SSAO). SSAO approximates
small-scale, local occlusion of _indirect_ diffuse light between
objects. SSAO does not apply to direct lighting, such as point or
directional lights.
- This darkens creases, e.g. on staircases, and gives nice contact
shadows where objects meet, giving entities a more "grounded" feel.
- Closes bevyengine#3632.

## Solution

- Implement the GTAO algorithm.
-
https://www.activision.com/cdn/research/Practical_Real_Time_Strategies_for_Accurate_Indirect_Occlusion_NEW%20VERSION_COLOR.pdf
-
https://blog.selfshadow.com/publications/s2016-shading-course/activision/s2016_pbs_activision_occlusion.pdf
- Source code heavily based on [Intel's
XeGTAO](https://github.com/GameTechDev/XeGTAO/blob/0d177ce06bfa642f64d8af4de1197ad1bcb862d4/Source/Rendering/Shaders/XeGTAO.hlsli).
- Add an SSAO bevy example.

## Algorithm Overview
* Run a depth and normal prepass
* Create downscaled mips of the depth texture (preprocess_depths pass)
* GTAO pass - for each pixel, take several random samples from the
depth+normal buffers, reconstruct world position, raytrace in screen
space to estimate occlusion. Rather then doing completely random samples
on a hemisphere, you choose random _slices_ of the hemisphere, and then
can analytically compute the full occlusion of that slice. Also compute
edges based on depth differences here.
* Spatial denoise pass - bilateral blur, using edge detection to not
blur over edges. This is the final SSAO result.
* Main pass - if SSAO exists, sample the SSAO texture, and set occlusion
to be the minimum of ssao/material occlusion. This then feeds into the
rest of the PBR shader as normal.

---

## Future Improvements
- Maybe remove the low quality preset for now (too noisy)
- WebGPU fallback (see below)
- Faster depth->world position (see reverted code)
- Bent normals 
- Try interleaved gradient noise or spatiotemporal blue noise
- Replace the spatial denoiser with a combined spatial+temporal denoiser
- Render at half resolution and use a bilateral upsample
- Better multibounce approximation
(https://drive.google.com/file/d/1SyagcEVplIm2KkRD3WQYSO9O0Iyi1hfy/view)

## Far-Future Performance Improvements
- F16 math (missing naga-wgsl support
https://github.com/gfx-rs/naga/issues/1884)
- Faster coordinate space conversion for normals
- Faster depth mipchain creation
(https://github.com/GPUOpen-Effects/FidelityFX-SPD) (wgpu/naga does not
currently support subgroup ops)
- Deinterleaved SSAO for better cache efficiency
(https://developer.nvidia.com/sites/default/files/akamai/gameworks/samples/DeinterleavedTexturing.pdf)

## Other Interesting Papers
- Visibility bitmask
(https://link.springer.com/article/10.1007/s00371-022-02703-y,
https://cdrinmatane.github.io/posts/cgspotlight-slides/)
- Screen space diffuse lighting
(https://github.com/Patapom/GodComplex/blob/master/Tests/TestHBIL/2018%20Mayaux%20-%20Horizon-Based%20Indirect%20Lighting%20(HBIL).pdf)

## Platform Support
* SSAO currently does not work on DirectX12 due to issues with wgpu and
naga:
  * gfx-rs/wgpu#3798
  * gfx-rs/naga#2353
* SSAO currently does not work on WebGPU because r16float is not a valid
storage texture format
https://gpuweb.github.io/gpuweb/wgsl/#storage-texel-formats. We can fix
this with a fallback to r32float.

---

## Changelog

- Added ScreenSpaceAmbientOcclusionSettings,
ScreenSpaceAmbientOcclusionQualityLevel, and
ScreenSpaceAmbientOcclusionBundle

---------

Co-authored-by: IceSentry <[email protected]>
Co-authored-by: IceSentry <[email protected]>
Co-authored-by: Daniel Chia <[email protected]>
Co-authored-by: Elabajaba <[email protected]>
Co-authored-by: Robert Swain <[email protected]>
Co-authored-by: robtfm <[email protected]>
Co-authored-by: Brandon Dyer <[email protected]>
Co-authored-by: Edgar Geier <[email protected]>
Co-authored-by: Nicola Papale <[email protected]>
Co-authored-by: Carter Anderson <[email protected]>
james7132 referenced this issue in james7132/bevy Jun 19, 2023
![image](https://github.com/bevyengine/bevy/assets/47158642/dbb62645-f639-4f2b-b84b-26fd915c186d)

# Objective

- Add Screen space ambient occlusion (SSAO). SSAO approximates
small-scale, local occlusion of _indirect_ diffuse light between
objects. SSAO does not apply to direct lighting, such as point or
directional lights.
- This darkens creases, e.g. on staircases, and gives nice contact
shadows where objects meet, giving entities a more "grounded" feel.
- Closes bevyengine#3632.

## Solution

- Implement the GTAO algorithm.
-
https://www.activision.com/cdn/research/Practical_Real_Time_Strategies_for_Accurate_Indirect_Occlusion_NEW%20VERSION_COLOR.pdf
-
https://blog.selfshadow.com/publications/s2016-shading-course/activision/s2016_pbs_activision_occlusion.pdf
- Source code heavily based on [Intel's
XeGTAO](https://github.com/GameTechDev/XeGTAO/blob/0d177ce06bfa642f64d8af4de1197ad1bcb862d4/Source/Rendering/Shaders/XeGTAO.hlsli).
- Add an SSAO bevy example.

## Algorithm Overview
* Run a depth and normal prepass
* Create downscaled mips of the depth texture (preprocess_depths pass)
* GTAO pass - for each pixel, take several random samples from the
depth+normal buffers, reconstruct world position, raytrace in screen
space to estimate occlusion. Rather then doing completely random samples
on a hemisphere, you choose random _slices_ of the hemisphere, and then
can analytically compute the full occlusion of that slice. Also compute
edges based on depth differences here.
* Spatial denoise pass - bilateral blur, using edge detection to not
blur over edges. This is the final SSAO result.
* Main pass - if SSAO exists, sample the SSAO texture, and set occlusion
to be the minimum of ssao/material occlusion. This then feeds into the
rest of the PBR shader as normal.

---

## Future Improvements
- Maybe remove the low quality preset for now (too noisy)
- WebGPU fallback (see below)
- Faster depth->world position (see reverted code)
- Bent normals 
- Try interleaved gradient noise or spatiotemporal blue noise
- Replace the spatial denoiser with a combined spatial+temporal denoiser
- Render at half resolution and use a bilateral upsample
- Better multibounce approximation
(https://drive.google.com/file/d/1SyagcEVplIm2KkRD3WQYSO9O0Iyi1hfy/view)

## Far-Future Performance Improvements
- F16 math (missing naga-wgsl support
https://github.com/gfx-rs/naga/issues/1884)
- Faster coordinate space conversion for normals
- Faster depth mipchain creation
(https://github.com/GPUOpen-Effects/FidelityFX-SPD) (wgpu/naga does not
currently support subgroup ops)
- Deinterleaved SSAO for better cache efficiency
(https://developer.nvidia.com/sites/default/files/akamai/gameworks/samples/DeinterleavedTexturing.pdf)

## Other Interesting Papers
- Visibility bitmask
(https://link.springer.com/article/10.1007/s00371-022-02703-y,
https://cdrinmatane.github.io/posts/cgspotlight-slides/)
- Screen space diffuse lighting
(https://github.com/Patapom/GodComplex/blob/master/Tests/TestHBIL/2018%20Mayaux%20-%20Horizon-Based%20Indirect%20Lighting%20(HBIL).pdf)

## Platform Support
* SSAO currently does not work on DirectX12 due to issues with wgpu and
naga:
  * gfx-rs/wgpu#3798
  * gfx-rs/naga#2353
* SSAO currently does not work on WebGPU because r16float is not a valid
storage texture format
https://gpuweb.github.io/gpuweb/wgsl/#storage-texel-formats. We can fix
this with a fallback to r32float.

---

## Changelog

- Added ScreenSpaceAmbientOcclusionSettings,
ScreenSpaceAmbientOcclusionQualityLevel, and
ScreenSpaceAmbientOcclusionBundle

---------

Co-authored-by: IceSentry <[email protected]>
Co-authored-by: IceSentry <[email protected]>
Co-authored-by: Daniel Chia <[email protected]>
Co-authored-by: Elabajaba <[email protected]>
Co-authored-by: Robert Swain <[email protected]>
Co-authored-by: robtfm <[email protected]>
Co-authored-by: Brandon Dyer <[email protected]>
Co-authored-by: Edgar Geier <[email protected]>
Co-authored-by: Nicola Papale <[email protected]>
Co-authored-by: Carter Anderson <[email protected]>
@cwfitzgerald cwfitzgerald added the naga Shader Translator label Oct 25, 2023
@cwfitzgerald cwfitzgerald transferred this issue from gfx-rs/naga Oct 25, 2023
@cryscan
Copy link

cryscan commented Nov 14, 2023

Hi! I am currently developing web-rwkv which implements an LLM with WGPU. It's already fast, but having this feature could make it even faster, which is super nice.

@teoxoy teoxoy changed the title [wgsl-in] f16 extension support [meta] f16 support Jan 25, 2024
@teoxoy teoxoy added area: naga back-end Outputs of naga shader conversion lang: SPIR-V Vulkan's Shading Language labels Jan 25, 2024
@teoxoy teoxoy added lang: Metal Metal Shading Language lang: HLSL D3D Shading Language labels Jan 25, 2024
@ErichDonGubler ErichDonGubler added area: correctness We're behaving incorrectly area: cts Issues stemming from the WebGPU Conformance Test Suite labels Apr 2, 2024
@tgross35
Copy link

Just to crosslink, Rust now has a nightly-only binary16 f16 rust-lang/rust#116909. Extremely unstable at this point but it might make things easier to implement down the line.

@FL33TW00D
Copy link
Contributor

FL33TW00D commented May 13, 2024

#5701

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: correctness We're behaving incorrectly area: cts Issues stemming from the WebGPU Conformance Test Suite area: naga back-end Outputs of naga shader conversion area: naga front-end lang: HLSL D3D Shading Language lang: Metal Metal Shading Language lang: SPIR-V Vulkan's Shading Language lang: WGSL WebGPU Shading Language naga Shader Translator type: enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants