Move GPU ukernel selection to KernelConfig #19440

bjacob · 2024-12-10T18:31:41Z

This moves the logic deciding whether an op should be a ukernel out of the GPULowerToUKernels pass, into KernelConfig.

So KernelConfig decides whether the op should be a ukernel, and encodes that into the resulting lowering_config, in a new parameter, that is a new attribute, UKernelSpecAttr. That attribute is directly modeled after the equivalent C++ data structure that we have had in LowerToUKernels passes, FnNameAndDefAttrs, which it replaces. If the attribute is present, it means that the op was selected for ukernel lowering, with the fields telling the ukernel name and some function definition attributes (to import any dependencies, such as the rocm module for runtime support symbols).

All the details about supplying the ukernel bitcode in a hal.executable.object are also moved there, becoming a side effect of KernelConfig.

The GPULowerToUKernels becomes much simpler, since all the decision-making was already done for it. It just looks at the LoweringConfigAttr and if it's there, it performs the requested lowering.

The motivation for this split is that we need to know in KernelConfig whether it's going to be a ukernel, because ops that will get lowered to a ukernel require a different configuration. The important example for us is multi_mma, which in the ukernel case needs to avoid reduction-dimension tiling to 1 so that the ukernel gets to see the reduction loop.

A few simplifications arise already in the current argmax ukernel logic, confirming that this was the right design choice: the old ukernel's matching logic was checking that the distribution tile sizes matched what the ukernel could handle; now that is turned upside down: the ukernel matching happens as a helper within KernelConfig where we know we are setting the appropriate tile sizes on purpose.

Another nice improvement is that this puts just enough distance between ukernel selection (which creates the hal.executable.object) and ukernel lowering, that we are able to insert HoistExecutableObjectsPass in between, simplifying the ukernel lowering as it doesn't need to worry anymore about preserving the hal.executable.object.

Signed-off-by: Benoit Jacob <[email protected]>

bjacob force-pushed the select-ukernels branch from 84a939a to 510fbd5 Compare December 12, 2024 21:27

bjacob changed the title ~~GPUSelectUKernelsPass~~ Move GPU ukernel selection to KernelConfig Dec 12, 2024

bjacob marked this pull request as ready for review December 12, 2024 21:55

bjacob requested review from kuhar, MaheshRavishankar, qedawkins, Groverkss and antiagainst as code owners December 12, 2024 21:55

hanhanW self-requested a review December 13, 2024 03:07

bjacob force-pushed the select-ukernels branch from 510fbd5 to a6e5117 Compare December 13, 2024 03:36

select-ukernels

906aada

Signed-off-by: Benoit Jacob <[email protected]>

bjacob force-pushed the select-ukernels branch from a6e5117 to 906aada Compare December 13, 2024 03:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move GPU ukernel selection to KernelConfig #19440

Move GPU ukernel selection to KernelConfig #19440

bjacob commented Dec 10, 2024 •

edited

Loading

Move GPU ukernel selection to KernelConfig #19440

Are you sure you want to change the base?

Move GPU ukernel selection to KernelConfig #19440

Conversation

bjacob commented Dec 10, 2024 • edited Loading

bjacob commented Dec 10, 2024 •

edited

Loading