Mesh Shaders #3018

inodentry · 2022-09-11T14:17:15Z

Mesh Shaders are an exciting new kind of rendering pipeline for modern hardware, that directly combines compute and rasterization.

KHRONOS recently officially released an extension for Vulkan, which is now quickly gaining support in different drivers. DX12 has had them for a while now. Metal (to my knowledge) also has them. Therefore, it should now be available on at least all the desktop platforms (given compatible hardware) and iOS.

Would it be possible for wgpu to provide access to this functionality?

There is a WebGPU tracking issue for it: gpuweb/gpuweb#3015 , but looks like it is not a priority and unlikely to be added to the WebGPU standard soon.

Perhaps wgpu could support it as a "native-only" extension?

The text was updated successfully, but these errors were encountered:

expenses · 2022-09-11T17:28:25Z

Mesh shaders as a native-only extension should be simple enough to implement, at least for vulkan. I'd like to look into this but I don't have a huge amount of free time to do so (similar to #1040 (comment)).

For anyone else interested, https://github.com/nvpro-samples/gl_vk_meshlet_cadscene is a vulkan example that uses the new VK_EXT_mesh_shader extension.

heavyrain266 · 2022-09-18T09:16:25Z

Mesh shader extension in WGSL could be really helpful for maintaining e.g. PSSL support in naga and PS5 API in wgpu, current implementation avoids usage of wgsl and requires rewriting all shaders in PSSL which is rather annoying to maintain.

zmarlon · 2022-10-11T22:02:07Z

Since there are now mesh shaders on DX12, Metal and Vulkan, it would at least be possible on all modern APIs. DX11, OpenGL ES and WebGPU would of course not work. Would there be any interest in seeing mesh shaders in WGPU? If so, I might try to implement it.

cwfitzgerald · 2022-10-12T21:20:32Z

There definitely is interest! It's going to be a big project though, so do join us in the chat room to chat about the design before starting work.

JunkuiZhang · 2023-07-18T05:57:15Z

Any progress have been made? Its a very exciting feature and maybe the future of game engine dev.

lylythechosenone · 2023-11-19T13:59:43Z

Indeed, I would like to know this too. Has work actually started yet, or is this still just an idea?

@cwfitzgerald

cwfitzgerald · 2023-11-19T18:48:22Z

No one has taken lead in it - it's a rather involved thing, needing both improvements to Naga and wgpu.

If someone was so motivated, we could walk through potential implementation plans.

ZoopOTheGoop · 2024-06-05T05:34:10Z

Tag me in, I'd need a bit of a push in the right direction, but I'm willing to do the legwork.

JMS55 · 2024-06-08T04:48:23Z

I'm not a wgpu maintainer, and can't help implement it, but I'd be happy to help test it out if you do implement mesh shaders. It's a feature I've been very much wanting.

In terms of implementation you would need to:

Figure out the common denominator between metal/vulkan/dx12 for an API to implement
Figure out all the edge cases and things that could go wrong, and how you would add validation for it
Add mesh shader support to naga's WGSL frontend
Add mesh shader support to naga's spirv/hlsl/msl backends

I definitely encourage you to join the matrix channel linked in the wgpu readme and talk to the wgpu devs there if you want to implement mesh shaders.

Bromles · 2024-07-09T11:43:30Z

Looks like mesh shaders may be added to the WebGPU standard itself after 1.0

gpuweb/gpuweb#3015

SupaMaggie70Incorporated · 2025-01-10T22:25:37Z

Overview & problem

I've been looking at the issue of mesh shading recently, and I have some ideas for how it could be implemented into wgsl. I have so far only looked at DirectX and Vulkan, but I would imagine Metal would be somewhat similar. The only features added to the CPU side are

Creating mesh-shaded render pipelines
Drawing using these pipelines(including indirect draws)

Creating the mesh pipeline seems to be very similar to creating standard pipelines in vulkan and directx, at least in terms of what is required from the public facing API. I expect metal to be similar. The draw calls are very simple and pretty much identical between vulkan and directx, and from looking at metal examples they seem to be similar there(though metal does differentiate between drawing with threads and threadgroups).
The GPU side of things is where all of the complexity comes in. Below is a non-exhaustive list of what would need to be added:

A mesh shader enable extension
Task & mesh shader entry points
- Not a real issue but should we call task shaders task shaders or amplification shaders? I will go with task shaders for now
Allow most compute operations in these stages
setMeshOutputs and emitMeshTasks functions/operations, for mesh and task functions respectively
- emitMeshTasks would probably take an argument of a task_payload variable for compatibility with HLSL even though GLSL automatically determines the task_payload variable to use and SPIR-V doesn't require this as an argument at all.
A way to mark fragment inputs as per primitive
Possibly a new address space, task_payload, for carrying data from the task shader to the mesh shader
- HLSL avoids this by simply using workgroup storage and setting it to task_payload if used as an emitMeshTasks parameter
Builtins
- Allow most compute stage builtins to be used in task and mesh shaders
- Allow most vertex stage output builtins to be output by mesh shaders
- Allow primitive_id to be written to by mesh shader
- cull_primitive: bool - output of the mesh shader per primitive
- point_index: u32, line_indices: vec2<u32>, triangle_indices: vec3<u32> for outputting the indices of a specific primitive
The most important problem is that of how to actually output the data from mesh shaders.

While a decent amount of work, most of this could probably be done in at most a few days. The main problem is with the handling of complex outputs from the mesh shader. To that end, I have 3 main ideas for how this could be done in WGSL. I will show code snippets and explain each idea. Note that all of the below are just prototypes. The names of builtins or functions or the exact syntax could change at any time.

Main proposal - Metal inspired

struct VertexOutput {
	@builtin(position) position: vec4<f32>,
	@location(0) color: vec4<f32>,
}
struct PrimitiveOutput {
	@builtin(triangle_indices) index: vec3<f32>,
	@builtin(cull_primitive) cull: bool,
	@location(1) colorMask: vec4<f32>,
}

@mesh
@workgroup_size(1)
fn ms_main<VertexOutput, PrimitiveOutput>(@builtin(local_invocation_index) index: u32, @builtin(global_invocation_id) id: vec3<u32>) {
	setMeshOutputs(3, 1);
	setVertex(0, VertexOutput { ... });
	...
	setPrimitive(0, PrimitiveOutput { ... });
}

This would be the easiest to implement while requiring few features to the WGSL language itself. The main issue would be with the logic around which types are used for vertex and primitive output. However, once implemented, it would function relatively well, work with all major APIs(that I'm aware of), and not have any potential performance drawbacks.

Proposal #2

struct VertexOutput {
	@builtin(position) position: vec4<f32>,
	@location(0) color: vec4<f32>,
}
struct PrimitiveOutput {
	@builtin(triangle_indices) index: vec3<f32>,
	@builtin(cull_primitive) cull: bool,
	@location(1) colorMask: vec4<f32>,
}
struct MeshOutput {
	@per_vertex vertices: array<VertexOutput, 3>,
	@per_primitive primitives: array<PrimitiveOutput, 1>,
}

@mesh
@workgroup_size(1)
fn ms_main(@builtin(local_invocation_index) index: u32, @builtin(global_invocation_id) id: vec3<u32>) -> MeshOutput {
	...
}

This is probably the most WGSL-like method. While on the surface it seems reasonable, the problem arises with how naga handles function outputs in SPIR-V(and probably other targets). Currently, naga will create a struct, fill in values, and then copy those values to the actual output, which means you are doubling the number of writes required. Doing that for mesh shaders would involve many copies in loops, with the number of copies determined by the parameters to setMeshOutputs. While this is probably optimized away by many drivers, and probably poses little performance problems, it leads to unnecessarily long and complicated code that could perform worse on vulkan implementations that don't make this optimization. If we could make substantial internal improvements to naga or verify that this doesn't significantly impact performance, this could be the method of choice.

We would also need to make several changes to WGSL, such as allowing array outputs and in general a more complex output system. This approach would mean a lot of work.

Proposal #3

struct VertexOutput {
	@builtin(position) position: vec4<f32>,
	@location(0) color: vec4<f32>,
}
struct PrimitiveOutput {
	@builtin(triangle_indices) index: vec3<f32>,
	@builtin(cull_primitive) cull: bool,
	@location(1) colorMask: vec4<f32>,
}
@vertices
var<out> vertices: array<VertexOutput>;
@primitives
var<out> primitives: array<PrimitiveOutput>;

@mesh(triangles, 3, 1)
@workgroup_size(1)
fn ms_main(@builtin(local_invocation_index) index: u32, @builtin(global_invocation_id) id: vec3<u32>) {
	...
}

This is the option that aligns best with GLSL.

Full example

This is a full example of a WGSL shader using proposal #1 showcasing most features and how I expect them to be implemented.

enable mesh_shading;

const positions = array(
	vec4(0.,-1.,0.,1.),
	vec4(-1.,1.,0.,1.),
	vec4(1.,1.,0.,1.)
);
const colors = array(
	vec4(0.,1.,0.,1.),
	vec4(0.,0.,1.,1.),
	vec4(1.,0.,0.,1.)
);

struct TaskPayload {
	colorMask: vec4<f32>,
	visible: bool,
}
var<task_payload> taskPayload: TaskPayload;
var<workgroup> workgroupData: f32;

struct VertexOutput {
	@builtin(position) position: vec4<f32>,
	@location(0) color: vec4<f32>,
}
struct PrimitiveOutput {
	@builtin(triangle_indices) index: vec3<f32>,
	@builtin(cull_primitive) cull: bool,
	@location(1) colorMask: vec4<f32>,
}

@task
@workgroup_size(1)
fn ts_main() {
	workgroupData = 1.0;
	taskPayload.colorMask = vec4(1.0, 1.0, 0.0, 1.0);
	taskPayload.visible = true;
	emit_mesh_tasks(3u, 1u, 1u, &taskPayload);
}

@mesh
@workgroup_size(1)
fn ms_main(@builtin(local_invocation_index) index: u32, @builtin(global_invocation_id) id: vec3<u32>) {
	set_mesh_outputs(3u, 1u);
	workgroupData = 2.0;
	setVertex(0, VertexOutput {
		position: positions[0],
		color: colors[0] * taskPayload.colorMask,
	});
	setVertex(1, VertexOutput {
		position: positions[1],
		color: colors[1] * taskPayload.colorMask,
	});
	setVertex(2, VertexOutput {
		position: positions[2],
		color: colors[2] * taskPayload.colorMask,
	});
	setPrimitive(0, PrimitiveOutput {
		index: vec3<u32>(0, 1, 2),
		cull: !taskPayload.visible,
		colorMask: vec4<f32>(1.0, 0.0, 1.0, 1.0),
	});
}
@fragment
fn fs_main(vertex: VertexOutput, primitive: PrimitiveOutput) -> @location(0) vec4<f32> {
	return vertex.color * primitive.colorMask;
}

Summary

In summary, I believe this could be accomplished in a relatively short amount of time, with the best option for the WGSL implementation being in my opinion proposal #1. I am asking for input from anybody else who has recommendations or ideas or sees a potential problem, before I actually start working on this. On a completely separate note, this might also make it easier to implement other improvements that people want(like tesselation shaders), though I don't know enough about those features to know how one would go about implementing them.

Also, sorry for the long drawn out(and obviously unpracticed) format. I am asking for advice and recommendations here, and just throwing out some ideas. I haven't really done many open source contributions before or written any RFCs.

Additional issues to consider

Terminology - task vs amplification vs object shaders? Not important, I'm going with task shaders for now
How to handle multiview, multidraw, indirect, etc with mesh shaders. Are special features required? It seems multiview is not guaranteed to be supported by mesh shader devices on vulkan. Queries are also not guaranteed
Queries in general
Should creating a mesh pipeline use the same descriptor types and functions as creating a standard render pipeline? Currently, I have gone with them being separate.
- There is significant overlap between the checks needed when creating a mesh shader vs standard render pipeline.

cwfitzgerald · 2025-01-11T01:31:21Z

Hey! I just wanted to say thank for putting this proposal together! It'll be a few days until I can digest it, but this is great!

Vecvec · 2025-01-11T08:08:36Z

point_index: u32

I think points may need to be an extra extension as DirectX seems not to support it

from https://www.khronos.org/blog/mesh-shading-for-vulkan#portability

	DirectX 12	VK_EXT_mesh_shader
Supported primitives	triangles, lines	triangles, lines, points

I'm not sure if 0 length lines are rendered (it might depend on the hardware) but if they are then that could probably be used as a point. DirectX doesn't appear to mention if this approach would work, so I suspect it could take some testing.

SupaMaggie70Incorporated · 2025-01-12T19:03:51Z

That's an interesting idea. I worry that there isn't a good way to render non one width lines with directx, and a one pixel point probably isn't desirable. Granted I've never used directx myself, so I may be wrong about any number of things. It does seem that at least initially, point rendering should be put behind a feature flag(or omitted entirely).

SupaMaggie70Incorporated · 2025-01-16T22:34:17Z

One other question I have regarding the pipelines is whether they should be just standard RenderPipeline's or whether they should have their own mesh shader pipeline type. Since they have different commands, it may be useful to differentiate the two to mitigate bugs or user error, but all 3 major APIs have decided not to.

cwfitzgerald · 2025-02-13T01:09:14Z

Echoing what I said before, this is positively amazing work!

Proposals...

I think I agree that Proposal #1 is basically bang on what we want. The only syntax I don't like is the templating of the mesh shader function. I think this could be done more idiomatically by attaching these types to the attribute or something similar.

@mesh(VertexOutput, PrimitiveOutput)
@workgroup_size(1)
fn ms_main() {
}

Terminology - task vs amplification vs object shaders? Not important, I'm going with task shaders for now

sgtm, I'm sure the webgpu committee will call them something entirely different anyway

setMeshOutputs and emitMeshTasks functions/operations, for mesh and task functions respectively

emitMeshTasks would probably take an argument of a task_payload variable for compatibility with HLSL even though GLSL automatically determines the task_payload variable to use and SPIR-V doesn't require this as an argument at all.

This sounds fine. The hardest part of this is validating that these functions will be properly called and erroring out (or dealing with it properly at runtime if they don't). I know that mesh shaders do pretty bad things if you misbehave when using them, so protecting against this will be important. This is a pretty compelling argument for the struct style syntax, as you cannot forget to set all variables.

set_mesh_outputs(3u, 1u);

Do these need to be compile-time known numbers?

fn ts_main() {

I know that you must call emit_mesh_tasks at the end of the task shader, I wonder if it would be easier to ensure this happens by requiring that task shaders return vec3u and then when we transpile, we insert the call with this type.

This kind of stuff which is the spicy part of the api design for mesh shaders, but I think we're already pretty close.

How to handle multiview, multidraw, indirect, etc with mesh shaders. Are special features required? It seems multiview is not guaranteed to be supported by mesh shader devices on vulkan. Queries are also not guaranteed

I think the question is, on devices that support both multiview and mesh shaders, do they support multiview in mesh shaders? https://github.com/kainino0x/gpuinfo-vulkan-query is a vital tool for figuring those things out.

One other question I have regarding the pipelines is whether they should be just standard RenderPipeline's or whether they should have their own mesh shader pipeline type. Since they have different commands, it may be useful to differentiate the two to mitigate bugs or user error, but all 3 major APIs have decided not to.

This is one place where we can use enums to our benefit, you could have a single argument in the RenderPipelineDescriptor for "the thing before the rasterizer" that can be either a Option/Mesh pair, or a Vertex shader.

There is significant overlap between the checks needed when creating a mesh shader vs standard render pipeline

I will note, that we can do this internally even if we decide the external api should be separate.

It does seem that at least initially, point rendering should be put behind a feature flag(or omitted entirely).

Points are really niche and fall over very quickly in production use cases, so I'm not stressed about completely ignoring it until we get something going.

Queries in general

I think we can put mesh specific queries off to a follow up task, it's not really hard, just particular.

Also, sorry for the long drawn out(and obviously unpracticed) format. I am asking for advice and recommendations here, and just throwing out some ideas. I haven't really done many open source contributions before or written any RFCs.

Well you did an absolutely amazing job, you should be proud!

SupaMaggie70Incorporated · 2025-02-13T04:18:22Z

I think I agree that Proposal #1 is basically bang on what we want. The only syntax I don't like is the templating of the mesh shader function. I think this could be done more idiomatically by attaching these types to the attribute or something similar.

I agree that the templating was pretty awful, I honestly came up with that after the main part really quickly once I realized we needed to specify those types.

Do these need to be compile-time known numbers?

No, but the shader itself must specify somewhere the maximum values it can set for these. I added this in proposals #2 and #3, but evidently forgot to add it to #1. Probably just another thing that belongs in the @mesh attribute like proposal #3 has it.

I know that you must call emit_mesh_tasks at the end of the task shader, I wonder if it would be easier to ensure this happens by requiring that task shaders return vec3u and then when we transpile, we insert the call with this type.

I really like your idea about returning a vec3, that actually makes it so much easier to check that it doesn't get called multiple times, and that it certainly does get called, and that it is the last thing called.

I think the question is, on devices that support both multiview and mesh shaders, do they support multiview in mesh shaders? https://github.com/kainino0x/gpuinfo-vulkan-query is a vital tool for figuring those things out.

Funny you mention the multiview stuff, in my final changes to pass CI checks it turned out I wasn't checking for multiview with mesh shader support. LLVMPIPE apparently supports multiview and mesh shaders but not multiview in mesh shaders. So this is definitely something we need to be wary of.

This is one place where we can use enums to our benefit, you could have a single argument in the RenderPipelineDescriptor for "the thing before the rasterizer" that can be either a Option/Mesh pair, or a Vertex shader.

I will note, that we can do this internally even if we decide the external api should be separate.

This makes sense.

Points are really niche and fall over very quickly in production use cases, so I'm not stressed about completely ignoring it until we get something going.

Yeah I've never used points before, and honestly can't think of many uses. I think lines are slightly more useful and a good bit to implement so we could probably leave those in.

I think we can put mesh specific queries off to a follow up task, it's not really hard, just particular.

Sounds good to me. I'm not experienced with queries so that probably wouldn't be something I'd implement myself anyway.

Well you did an absolutely amazing job, you should be proud!

Thanks so much!

cwfitzgerald · 2025-02-13T16:55:21Z

No, but the shader itself must specify somewhere the maximum values it can set for these. I added this in proposals 2 and 3, but evidently forgot to add it to 1. Probably just another thing that belongs in the @mesh attribute like proposal 3 has it.

Alright - I think probably the wgsl-ey would be something kinda silly:

@mesh @vertex_output(VertexOutput, 4) @primitive_output(PrimitiveOutput, 8)
@workgroup_size(1)
fn ms_main() {
}

Where mesh shaders must have vertex_ouput, primitive_output, and workgroup_size attributes to be a valid program.

lavapipe apparently supports multiview and mesh shaders but not multiview in mesh shaders.

Welp, sounds like we do!

cwfitzgerald changed the title ~~Mesh Shader support?~~ Mesh Shaders Oct 15, 2022

cwfitzgerald added type: enhancement New feature or request area: api Issues related to API surface backend: dx12 Issues with DX12 or DXGI backend: metal Issues with Metal backend: vulkan Issues with Vulkan labels Oct 15, 2022

cwfitzgerald mentioned this issue Nov 18, 2023

Mesh Shaders #4721

Closed

JMS55 mentioned this issue Jan 24, 2024

Meshlet tracking issue bevyengine/bevy#11518

Open

SupaMaggie70Incorporated mentioned this issue Feb 8, 2025

Mesh shaders - initial wgpu hal changes #7089

Open

6 tasks

cwfitzgerald added the feature: mesh shaders Issues with the Mesh Shading Native Feature label Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mesh Shaders #3018

Mesh Shaders #3018

inodentry commented Sep 11, 2022

expenses commented Sep 11, 2022

heavyrain266 commented Sep 18, 2022

zmarlon commented Oct 11, 2022

cwfitzgerald commented Oct 12, 2022

JunkuiZhang commented Jul 18, 2023

lylythechosenone commented Nov 19, 2023

cwfitzgerald commented Nov 19, 2023 •

edited

Loading

ZoopOTheGoop commented Jun 5, 2024 •

edited

Loading

JMS55 commented Jun 8, 2024 •

edited

Loading

Bromles commented Jul 9, 2024

SupaMaggie70Incorporated commented Jan 10, 2025 •

edited

Loading

cwfitzgerald commented Jan 11, 2025

Vecvec commented Jan 11, 2025

SupaMaggie70Incorporated commented Jan 12, 2025 •

edited

Loading

SupaMaggie70Incorporated commented Jan 16, 2025

cwfitzgerald commented Feb 13, 2025 •

edited

Loading

SupaMaggie70Incorporated commented Feb 13, 2025 •

edited by cwfitzgerald

Loading

cwfitzgerald commented Feb 13, 2025 •

edited

Loading

Mesh Shaders #3018

Mesh Shaders #3018

Comments

inodentry commented Sep 11, 2022

expenses commented Sep 11, 2022

heavyrain266 commented Sep 18, 2022

zmarlon commented Oct 11, 2022

cwfitzgerald commented Oct 12, 2022

JunkuiZhang commented Jul 18, 2023

lylythechosenone commented Nov 19, 2023

cwfitzgerald commented Nov 19, 2023 • edited Loading

ZoopOTheGoop commented Jun 5, 2024 • edited Loading

JMS55 commented Jun 8, 2024 • edited Loading

Bromles commented Jul 9, 2024

SupaMaggie70Incorporated commented Jan 10, 2025 • edited Loading

Overview & problem

Main proposal - Metal inspired

Proposal #​2

Proposal #​3

Full example

Summary

Additional issues to consider

cwfitzgerald commented Jan 11, 2025

Vecvec commented Jan 11, 2025

SupaMaggie70Incorporated commented Jan 12, 2025 • edited Loading

SupaMaggie70Incorporated commented Jan 16, 2025

cwfitzgerald commented Feb 13, 2025 • edited Loading

SupaMaggie70Incorporated commented Feb 13, 2025 • edited by cwfitzgerald Loading

cwfitzgerald commented Feb 13, 2025 • edited Loading

cwfitzgerald commented Nov 19, 2023 •

edited

Loading

ZoopOTheGoop commented Jun 5, 2024 •

edited

Loading

JMS55 commented Jun 8, 2024 •

edited

Loading

SupaMaggie70Incorporated commented Jan 10, 2025 •

edited

Loading

Proposal #2

Proposal #3

SupaMaggie70Incorporated commented Jan 12, 2025 •

edited

Loading

cwfitzgerald commented Feb 13, 2025 •

edited

Loading

SupaMaggie70Incorporated commented Feb 13, 2025 •

edited by cwfitzgerald

Loading

cwfitzgerald commented Feb 13, 2025 •

edited

Loading