Skip to content

3. Getting started πŸ“–

Sergio Pedri edited this page Dec 10, 2023 · 9 revisions

ComputeSharp exposes a GraphicsDevice class that acts as entry point for all public APIs. The available GraphicsDevice.GetDefault() method that lets you access the main GPU device on the current machine, which can be used to allocate buffers and perform operations. If your machine doesn't have a supported GPU (or if it doesn't have a GPU at all), ComputeSharp will automatically create a WARP device instead, which will still let you use the library normally, with shaders running on the CPU instead through an emulation layer. This means that you don't need to manually write a fallback path in case no GPU is available - ComputeSharp will automatically handle this for you.

Let's suppose we want to run a simple compute shader that multiplies all items in a target buffer by two. The first step is to create the GPU buffer and copy our data to it:

// Get some sample data
int[] array = [.. Enumerable.Range(1, 100)];

// Allocate a GPU buffer and copy the data to it.
// We want the shader to modify the items in-place, so we
// can allocate a single read-write buffer to work on.
using ReadWriteBuffer<int> buffer = GraphicsDevice.GetDefault().AllocateReadWriteBuffer(array);

The AllocateReadWriteBuffer extension takes care of creating a ReadWriteBuffer<T> instance with the same size as the input array and copying its contents to the allocated GPU buffer. There are a number of overloads available as well, to create buffers of different types and with custom length.

Next, we need to define the GPU shader to run. To do this, we'll need to define a partial struct type implementing the IComputeShader interface (note that the partial modifier is necessary for ComputeSharp to generate additional code to run the shader). This type will contain the code we want to run on the GPU, as well as fields representing the values we want to capture and pass to the GPU (such as GPU resources, or arbitrary values we need). Next, we need to add the [ThreadGroupSize] attribute to configure the dispatching configuration for the shader. This shader only operates on a 1D buffer, so we can use DefaultThreadGroupSizes.X for this. Lastly, we also need to add the [GeneratedComputeShaderDescriptor] attribute, to let the source generator bundled with ComputeSharp do its magic. In this case, we only need to capture the buffer to work on, so the shader type will look like this:

[ThreadGroupSize(DefaultThreadGroupSizes.X)]
[GeneratedComputeShaderDescriptor]
public readonly partial struct MultiplyByTwo(ReadWriteBuffer<int> buffer) : IComputeShader
{
    public void Execute()
    {
        buffer[ThreadIds.X] *= 2;
    }
}

In this example, we're using a primary constructor (but you can also explicitly declare fields and set them via a constructor). The shader body is also using a special ThreadIds class, which is one of the available special classes to access dispatch parameters from within a shader body. In this case, ThreadIds lets us access the current invocation index for the shader, just like if we were accessing the classic i variable from within a for loop.

We can now finally run the GPU shader and copy the data back to our array:

// Launch the shader
GraphicsDevice.GetDefault().For(buffer.Length, new MultiplyByTwo(buffer));

// Get the data back
buffer.CopyTo(array);

Capturing variables 🎈

Shaders can store either GPU resources or custom values in their fields, so that they can be accessed when running on the GPU as well. This can be useful to pass some extra parameters to a shader (eg. some factor to multiply values by), that don't belong to a GPU buffer of their own. The captured variables need to be of a supported scalar or vector type so that they can be correctly used by the GPU shader in HLSL. Here is a list of the variable types currently supported:

βœ… .NET scalar types: bool, int, uint, float, double

βœ… HLSL types: Bool, Bool2, Bool3, Bool4, Float2, Float3, Float4, Int2, Int3, Int4, UInt2, Uint3, etc.

βœ… HLSL matrix types: Float2x2, Float3x3, Float3x4, Float4x4, Float1x4, Float4x, etc.

βœ… Custom struct types containing any of the types above, as well as other valid custom struct types

Tip

Since ComputeSharp generates global type aliases for all the HLSL types in consuming projects, it's possible to refer to them via the same name they use in HLSL, for simplicity. For instance, Float4 can just be referred to as float4, Float4x4 will just be float4x4, etc. This also allows the type names to look more consistent with other C# primitive types.

GPU resource types 🎞️

There are a number of extension APIs for the GraphicsDevice class that can be used to allocate GPU resources. Here is a breakdown of the main resource types that are available:

  • ReadWriteBuffer<T>: this type can be viewed as the equivalent of the T[] type, and can contain a writeable sequence of any of the supported types mentioned above. It is very flexible and works well in most situations. If you're just getting started and are not sure about what kind of buffer to use, this is usually a good choice.
  • ReadOnlyBuffer<T>: this type represents a sequence of items that the GPU cannot write to. It is particularly useful to declare intent from within a compute shader and to avoid accidentally writing to data that is not supposed to change during the execution of a shader.
  • ConstantBuffer<T>: this type is meant to be used for small sequences of items that never change during the execution of a shader. Compared to ReadOnlyBuffer<T> is has more constraints, but can benefit from better caching on the GPU side (it is recommended to verify with proper benchmarking that this type is appropriate to use). Items within a ConstantBuffer<T> instance are packed to 16 bytes, which helps the GPU to have a particularly fast access time to them, but the total size of the buffer is limited to around 64KB. Copying to and from this buffer can also have additional overhead, as the GPU needs to account for the possible padding for each item (as the 16 bytes alignment is not present on the CPU side). If you're in doubt about which buffer type to use, just use either ReadOnlyBuffer<T> or ReadWriteBuffer<T>, depending on whether or not you also need write access to that buffer on the GPU side.
  • ReadOnlyTexture2D<T> and ReadWriteTexture2D<T>: these types represent a 2D texture with elements of a specific type. Note that textures are not just 2D arrays, and have additional characteristics and limitations. Items in a texture are stored with a tiled layout instead of with the classic row-major order that .NET T[,] arrays have, and this allows them to be extremely fast when accessing small areas of neighbouring items (due to better cache locality). This can offer a big performance speedup in operations that have a similar memory access pattern, such as blur effect or convolutions in general. Textures also have limitations in the type of items they can contain (eg. custom struct types are not supported), and you can check if a specific type is supported at runtime with GraphicsDevice.IsReadOnlyTexture2DSupportedForType<T>() and IsReadWriteTexture2DSupportedForType<T>().
  • ReadOnlyTexture3D<T> and ReadWriteTexture3D<T>: these are just like 2D textures, but in 3 dimensions. The same characteristics and limitations apply, with the addition of the fact that the depth axis has a much smaller limit on the size it can have. The GraphicsDevice.IsReadOnlyTexture3DSupportedForType<T>() and IsReadWriteTexture3DSupportedForType<T>() methods can be used to check for type support at runtime.
  • ReadOnlyTexture2D<T, TPixel> and ReadWriteTexture2D<T, TPixel>: these texture types are particularly useful when doing image processing from a compute shader, as they allow the CPU to perform pixel format conversion automatically. The two type parameters indicate the type of items in a texture when exposed on the CPU side (as T items) or on the GPU side (as TPixel items). The type parameter T can be any of the existing pixel formats available in ComputeSharp (such as Rgba32 and Bgra32), while the TPixel parameter represents the type of elements the GPU will be working with (for instance, Float4 for Rgba32). These texture types have a number of benefits such as lower memory usage and reduced overhead on the CPU (as there is no need to manually do the pixel format conversion when copying data back and forth from the GPU).
  • ReadOnlyTexture3D<T, TPixel> and ReadWriteTexture3D<T, TPixel>: these types are just like their 2D equivalent types, with the same support for automatic pixel format conversion on the GPU side.
  • UploadBuffer<T>: this type can be used in more advanced scenarios where performance is particularly important. It represents a buffer that can be copied directly to a GPU structured buffer (either ReadWriteBuffer<T> or ReadOnlyBuffer<T>), without the need to first create a temporary transfer buffer to do so. If there is a large number of copy operations being performed to GPU buffers, it can be beneficial to create a single UploadBuffer<T> instance to load data to copy to the GPU, and execute a single copy from there.
  • ReadBackBuffer<T>: this type is another advanced buffer type that is analogous to UploadBuffer<T>, but that can be used to copy data back from a GPU buffer. In particular, this buffer can also be accessed quickly by the GPU when reading or writing data from it, so it can also be used for further processing of data on the CPU side, without the need to copy the data onto another buffer first (such as a .NET array).
  • UploadTexture2D<T>, UploadTexture3D<T>, ReadBackTexture2D<T> and ReadBackTexture3D<T>: these types are conceptually similar to UploadBuffer<T> and ReadBackBuffer<T>, with the main difference being that they can be used to copy data back and forth from 2D and 3D textures respectively.

Important

Although the various APIs to allocate buffers are simply generic methods with a T : unmanaged constrain, they should only be used with C# types that are supported (see notes above). Additionally, the bool type should not be used in buffers due to C#/HLSL differences: use the Bool type instead (or just an int buffer).

HLSL vector and matrix types 🧩

As mentioned in the Capturing variables paragraph, ComputeSharp also exposes matrix types that can be used in compute shaders. These values store individual component values in row-major order (for consistency with .NET arrays) and can be indexed in several ways just like with HLSL vector types. One noticeable difference compared to HLSL vector types though is the lack of explicit properties to extract swizzled vectors (eg. Float4.XZY returns a Float3 value with the X, Z and Y components). Due to the number of possible combinations being simply too high in the case of matrix types (eg. Float4x4 alone would have had over 160k properties), the ability to extract swizzled vectors (see here) can be achieved through the use of a special indexer property and values from the MatrixIndex type. Here is how it can be used:

float4x4 matrix = default;

// Standard indexer for rows and individual items
float4 row = matrix[0];
float item = matrix[0][1];

// Swizzled indexers, which can be made less verbose to
// write by adding this using static directive to the file
using static ComputeSharp.MatrixIndex;

float4 diagonal = matrix[M11, M22, M33, M44];
float4 vertices = matrix[M11, M14, M44, M41];

Matrix types also include a number of built-in operators to work with vector types, and the Hlsl class detailed below (see HLSL intrinsics) also includes several overloads for the available methods to work on both matrix and vector types at the same time (eg. for row/matrix multiplication and other common linear algebra operations).

Important

In order to make all the available properties and indexers usable when declaring shader constants and globals (see Shader constants and globals), they will return undefined data if used on the CPU instead of throwing an exception, and in this case their behavior is considered undefined. Refer to the XML docs for each API for further info, as properties and operators that are only meant to be used in a shader are clearly marked as undefined behavior if used on the CPU side. The only APIs guaranteed to be usable on the GPU are constructors, static properties and properties to access individual elements (eg. Float4.X).

HLSL intrinsics πŸͺ„

ComputeSharp offers support for all HLSL intrinsics that can be used from compute shaders. These are special functions (usually representing mathematical operations) that are optimized on the GPU and can execute very efficiently. Some of them can be mapped automatically from methods in the System.Math type, but if you want to have direct access to all of them you can just use the methods in the Hlsl class from a compute shader, and ComputeSharp will rewrite all those calls to use the native intrinsics in the compute shaders where those are used.

Here's an example of a shader that applies the softmax function to all the items in a given buffer:

[ThreadGroupSize(DefaultThreadGroupSizes.X)]
[GeneratedComputeShaderDescriptor]
public readonly partial struct SoftmaxActivation(
    ReadWriteBuffer<float> buffer,
    float k) : IComputeShader
{
    public void Execute()
    {
        float exp = Hlsl.Exp(k * buffer[ThreadIds.X]);
        float log = Hlsl.Log(1 + exp);

        buffer[ThreadIds.X] = log / k;
    }
}

Important

In order to make all the intrinsics usable when declaring shader constants and globals (see Shader constants and globals), most intrinsics will just return a default value if used on the CPU instead of throwing an exception, and in this case their behavior is considered undefined. Make sure to only ever use these methods in a shader, as their results will not be correct in other execution contexts.

Tip

If you're porting shader code from GLSL, you can use this guide as a handy reference.

Shader constants and globals πŸ”­

One common approach to make shaders easier to modify and experiment with is to separate the parameters being used from the code using them, by defining them as constants. ComputeSharp has special handling for this, and will rewrite static fields as global variables in the generated shaders (marking them as constants if needed), which are optimized by the compiler for frequent access during execution. The advantage of using constants is that they're directly embedded into a compiled shader, so they don't need to be loaded in memory whenever a shader is dispatched. Non constant shader global variables, on the other hand, can make code easier to write as they remove the need to pass multiple values around in each function that is invoked. Static fields can be of any of the supported HLSL primitive types, including vector and matrix types. Furthermore, the initialization of these constants can also use any of the available HLSL intrinsics. If the same result is being used multiple times while a shader is executed, moving values to a shader constant is a good way to avoid repeatedly computing the same value over and over.

Here is an example of how shader constants and mutable globals can be declared and used:

[ThreadGroupSize(DefaultThreadGroupSizes.X)]
[GeneratedComputeShaderDescriptor]
public readonly partial struct SampleShaderWithConstants(ReadWriteBuffer<float> buffer) : IComputeShader
{
    private const int iterations = 10;
    private const float pi = 3.14f;
    private static readonly float2 sinCosPi = new(Hlsl.Sin(Pi), Hlsl.Cos(Pi));
    private static float sum;

    public void Execute()
    {
        for (int i = 0; i < iterations; i++)
        {
            sum += pi + sinCosPi.X * sinCosPi.Y;
        }

        buffer[ThreadIds.X] = sum;
    }
}

As shown above, there are two ways to declare a constant value in a shader: either by using const in C# (which is perfect for when a value is just a primitive scalar type initialized with a constant expression), or by using static readonly. The latter has the advantage that it allows the type to be of a vector and matrix type as well, and the initialization can also use any of the available HLSL intrinsics. Lastly, mutable globals can be declared by just using the static modifier, and they can also optionally have an initializer (otherwise their default value will be used).

Dispatch info βš—οΈ

As shown in the first paragraph, ComputeSharp includes a number of special types that allow shaders to access a number of useful dispatch info values, such as the current iteration coordinate or the target dispatch range. Here is a list of all the available types in this category:

  • ThreadIds: indicates the ids of a given GPU thread running a compute shader. That is, it enables a shader to access info on the current iteration index along each axis.
  • ThreadIds.Normalized: indicates the normalized ids of a given GPU thread running a compute shader. These ids represent equivalent info to those from ThreadIds, but normalized in the [0, 1] range. The range used for the normalization is the one given by the target dispatch size.
  • DispatchSize: indicates the size of the current shader dispatch being executed. That is, it enables a shader to access info on the targeted number of invocations along each axis.
  • GroupIds: indicates the ids of a given GPU thread running a compute shader within a dispatch group. That is, it enables a shader to access info on the index of the current thread with respect to the currently running group.
  • GroupSize: indicates the size info of a given GPU thread group running a compute shader. That is, it enables a shader to access info on the size of the thread groups being used.
  • GridIds: indicates the ids of the current thread group within the dispatch grid. That is, it enables a shader to access info on the index of the current thread group with respect to the dispatch grid.

For more info on how all these values relate to the corresponding HLSL inputs, see the [numthreads] docs here. Note that ThreadIds, GroupIds and GridIds map to SV_DispatchThreadID, SV_GroupThreadID and SV_GroupID respectively, and the GroupIds.Index property maps to SV_GroupIndex.

Working with images πŸ–ΌοΈ

As mentioned in the GPU resource types paragraph, there are several texture types that are specialized to work on image pixels, such as ReadWriteTexture2D<T, TPixel>. Let's imagine we want to write a compute shader that applies some image processing filter. We will need to load an image (in this example we will use the integrated APIs to do so, but external libraries such as ImageSharp or just the System.Drawing can also be used) and then process it on the GPU, but without spending time on the CPU to convert pixels from a format such as BGRA32 to the normalized float values we want our shader to work on. We can do this by utilizing the ReadWriteTexture2D<T, TPixel> type as follows:

// Load a texture from a specified image, and decode it in the BGRA32 format
using var texture = GraphicsDevice.GetDefault().LoadReadWriteTexture2D<Bgra32, float4>("myImage.jpg");

// Run our shader on the texture we just loaded
GraphicsDevice.GetDefault().For(texture.Width, texture.Height, new GrayscaleEffect(texture));

// Save the processed image by overwriting the original image
texture.Save("myImage.jpg");

With the compute shader being like this:

[ThreadGroupSize(DefaultThreadGroupSizes.XY)]
[GeneratedComputeShaderDescriptor]
public readonly partial struct GrayscaleEffect(IReadWriteNormalizedTexture2D<float4> texture) : IComputeShader
{
    // Other captured resources or values here...

    public void Execute()
    {
        // Our image processing logic here. In this example, we are just
        // applying a naive grayscale effect to all pixels in the image.
        float3 rgb = texture[ThreadIds.XY].RGB;
        float avg = Hlsl.Dot(rgb, new(0.0722f, 0.7152f, 0.2126f));

        texture[ThreadIds.XY].RGB = avg;
    }
}

Note

This is just an example to illustrate how these texture types can help with automatic pixel format conversion. You're free to use any library of choice to load and save image data, as well as to how to structure your compute shaders representing image effects. This is just one of the infinite possible effects that could be achieved by using ComputeSharp.

You can also use a similar technique to create a blank texture, render a single frame of a pixel shader, and save the output to a file:

// Create a blank 1280x720 surface for us to render to
using var texture = GraphicsDevice.GetDefault().AllocateReadWriteTexture2D<Bgra32, float4>(1280, 720);

// Using an existing shader from our samples, we can render a specific time frame
GraphicsDevice.GetDefault().ForEach(texture, new FourColorGradient(0));

// Save the result to a file on disk
texture.Save("output.png");

Inspecting shaders πŸ”¬

For users that are familiar with the HLSL language and might want to access more info on a given shader generated by ComputeSharp, such as the compiled HLSL code or the statistics exposed by the DirectX 12 reflection APIs, the library includes a ReflectionServices class that allows to easily gather all these details on a given shader type.

Assuming we have the same shader defined in the first parapraph, here is how this class can be used:

ShaderInfo shaderInfo = ReflectionServices.GetShaderInfo<MainKernel>();

// Access info here, for instance...
string hlslSource = shaderInfo.HlslSource;
uint numberOfResources = shaderInfo.BoundResourceCount;
uint instructionCount = shaderInfo.InstructionCount;

DirectX interop πŸ›Έ

There are cases where it might be needed to manually interop with other DirectX APIs, to perform operations that are not included in the public API surface exposed by ComputeSharp. For instance, to copy the contents of a ReadWriteTextre2D<T, TPixel> to the backbuffer of a swap chain object, so that it can be rendered in a window. For these scenarios, the InteropServices class exposes APIs to easily retrieve the underlying COM pointers for any of the wrapping types in the library, and to perform a QueryInterface call on them to retrieve a COM pointer of a specified interface type. Here is how these APIs can be used:

using ComPtr<ID3D12Device> d3D12Device = default;
using ComPtr<ID3D12Resource> d3D12Resource = default;

using ReadWriteBuffer<float> buffer = GraphicsDevice.GetDefault().AllocateReadWriteBuffer<float>(128);

// Get the underlying ID3D12Device object
InteropServices.GetID3D12Device(GraphicsDevice.GetDefault(), __uuidof<ID3D12Device>(), (void**)d3D12Device.GetAddressOf());

// Get the underlying ID3D12Resource object
InteropServices.GetID3D12Resource(buffer, __uuidof<ID3D12Resource>(), (void**)d3D12Resource.GetAddressOf());

// Now the COM objects can be used directly, eg. to display images with a swap chain

The InteropServices class is also used by the swap chain sample to access the underlying COM objects and use them the rendered frames in a Win32 window. The full source code is available here, and it provides reference implementation of how these APIs can be used to interop with other DirectX objects.