MLIR Solvers Tuning Proposal #1018

jerryyin · 2021-06-29T21:01:14Z

Background and Motivation

Tuning has been known for providing optimal performance for a variety of algorithms. According to the study in #939, this continue to be true for cpp igemm solvers, the ones that MLIR solvers are prototyped from. To achieve the optimal performance towards important configs, we want to implement tuning for MLIR backend solvers in MIOpen. This involves three different areas:

MIOpen solvers and performance config, that supports tuning
Miir.h defined interface that supports the tuning parameter pass through
MLIR AffixTuningParams pass that can not only heuristic initialize, but can also initialize according to external passed in tuning parameters

Use Cases

The implementation of tuning support two high level use cases:

Running mode

MIOpen query the perfdb for existing config,
- If it is perfdb hit, MIOpen will pass the tuning parameters to MLIR, and MLIR will populate kernel accordingly.
- If it is perfdb miss, MIOpen will pass an empty string to MLIR, and MLIR will heuristically initialize the tuning parameters.
Tuning mode

MIOpen repeatedly populated different tuning parameters. MLIR will populate kernel according to the tuning parameter passed in.

Implementation

Tuning parameter passing mechanism

Tuning parameters will be passed through comma split-ed C strings, in exact same format as the field stored in perfdb. The string can be either Serialize() or Deserialize(), as in MIOpen::Serializable{}. This way, either MIOpen or MLIR is able to communicate the tuning parameters with one additional string field. The benefit of using this mechanism are below:

Compatible with perfdb. This approach is already validated by the extensive use of perfdb, and proves to be reliable.
Code reuse. No need to re-invent Serializable infrstructures.
Flexibility. String serialization can capture a variety of data types.

Performance Config

To enable tuning, MLIR solvers need to implement their own performance config data structures. The MLIR solver needs to implement 5 performance config functions, according to

https://github.com/ROCmSoftwarePlatform/MIOpen/blob/120289fcb33496db05518b24e3a39db01d5adb5c/src/include/miopen/generic_search.hpp#L61-L73

Alternative 1: Implement performance config in MIOpen

All the performance config implementation can be done together with the MIOpen solvers. This follows the existing practices. The implication of this is: MIOpen will be MLIR's only tuning driver, since only MIOpen's performance config have the knowledge about how to tune a MLIR solver. This implication looks acceptable as we do intend to use only MIOpen/Tuna infrastructure for tuning of MLIR solvers. In any other scenarios, the MLIR infrastructure works strictly as a compiler/kernel generator.

In this case, all we need to do is to implement two performance config class:

PerformanceConvMlirIgemm
- All fwd, bwd and wrw solvers will be using the same performance config because they all are dependent on the same tuning parameters. (As a background, since performance config entry includes the solver id, each solver can still hold different values towards the same problem)
PerformanceConvMlirIgemmXdlops
- All fwd, bwd and wrw xdlops solvers will be using the same performance config because they all are dependent on another same set of tuning parameters

Alternative 2: Implement performance config in MLIR

With alternative 1, one might have fear that the two components are coupled too closely together, because MIOpen's performance config has to have implementation detail of a MLIR solver. When a MLIR developer decides to change the way to tune its kernels, it has to make change in MIOpen.

A solution to this is to abstract the implementation details in the Miir.h interface, leaving MLIR component to be solely responsible for its tuning. It can be challenging because performance config does not dependent on any external information to make its decisions. Performance config are just data structures that encode the tuning parameters. Especially in SetNextValue(), it doesn't carry any context information at all. This makes it very hard to create a handle and let MLIR know any information that help to create the nextValue based on the current set of performance config.

We need to make a combination of changes to make it possible for performance config tuning logic placed in MLIR

1. One generic Performance Config information

In this generic implementation, all MLIR solvers can share one implementation of performance config. Under the hood, the performance config will invoke Miir interface to decide on whether the config is valid/ how to set the next config.

struct PerformanceMlirGeneric {
    std::string TuningParams;
    PerformanceConvMlirIgemmFwd(const std::string& params) : TuningParams(params) {}
    PerformanceConvMlirIgemmFwd(bool spare) : TuningParams("uninitialized_spare") {}
    PerformanceConvMlirIgemmFwd() : TuningParams("uninitialized") {}
    bool operator==(const PerformanceConvMlirIgemmFwd& other) const {
        return tuningParams == other.tuningParams;
    }
    bool IsValid(const ConvolutionContext& ctx) const {
        return MiirIsConfigApplicable(mlir::ConstructBuildOptions(ctx, TuningParams, ...));
    }
    bool SetNextValue(const ConvolutionContext& ctx) {
        std::string nextTuningParams;
        bool isEndOfSequence;
        MiirGetNextTuningParams(mlir::ConstructBuildOptions(ctx, TuningParams, ...), nextTuningParams, isEndOfSequence);
        if (isEndOfSequence) return false
        TuningParams = nextTuningParams;
        return true;
    }
}

2. ComputedIteratorGeneric

The new ComputedIteratorMLIRGeneric will inherits from the original iterator. Based on whether the performance config is of PerformanceMlirGeneric, we have a different implementation of SetNextValue() that allow us to pass in the convolution context.

do
{
    if(!v.SetNextValue(*p))
    { // Wraparound, end reached. Iterator is useless from now.
        p = nullptr;
        break;
    }
} while(!v.IsValid(*p));

3. dynamically create a MLIR container

if (dynamic_cast<PerformanceMlirGeneric>(s.GetPerformanceConfig(context))) {
    ComputedContainerMlirGeneric<PerformanceMlirGeneric, Context> main(context);
    ...
}

Side effects

Need to apply a patch to the ComputedContainer
Every PerformanceConfig class need to implement the new SetNextValue(ConvolutionContext) interface
Overhead in dynamically checking the performance config type

MLIR Implementations

For alternative 1, the only changes needed on MLIR side are:

The new handle now force the client to pass in a new field: --tuning_parameter <params>
If <params> is empty, do its internal heuristic intialization
Otherwise, use the passed in tuning parameter

For alternative 2, in addition, it requires a couple of new APIs to implement the performance config interface:

extern "C" MiirStatus miirTuningParamsInit(MiirHandle handle, char *tuningParams, size_t *size);
extern "C" MiirStatus miirTuningParamsNext(MiirHandle handle, char *tuningParams, size_t *size, bool *isEndOfSequence);

Under the hood, the two interface function in MLIR side will invoke into the corresponding static functions. Depending whether or not this is a xdlops requrest, according to the handle, it will launch and return a new set of tuning parameters according to the one from the handle.

Summary

Considering we do plan on make MIOpen the only tuning driver for MLIR, I personally favor alternative 1 over 2 on a variety of factors:

There's less amount of changes necessary because it's relative easier to implementation.
No need to apply architectural level patches for MLIR solvers only.
Performance config and its iterator are fairly straightforward interface to implement in MIOpen and make it a lot more plumbing/overhead to implement in MLIR.

The text was updated successfully, but these errors were encountered:

atamazov · 2021-06-29T22:51:56Z

Especially in SetNextValue(), it doesn't carry any context information at all. This makes it very hard to create a handle and let MLIR know any information that help to create the nextValue based on the current set of performance config.

Please see #388 and raise urgency level if necessary.

Considering we do plan on make MIOpen the only tuning driver for MLIR, I personally favor alternative 1 over 2 on a variety of factors...

I am for it.

jerryyin · 2021-06-30T16:51:31Z

Please see #388 and raise urgency level if necessary.

Thanks for bring that up, didn't know that has been documented. This is going to be a high priority job for me because it's going to be the largest feature in the next milestone.

I am for it.

Niiiiiice. Thanks for the review.

@JehandadKhan @whchung Any questions, concerns?

JehandadKhan · 2021-06-30T21:50:56Z

@jerryyin We have consensus since Approach 1 makes the most sense to me as well. Thank you for formalizing the ideas.

carlushuang · 2021-07-01T00:33:29Z

@jerryyin this is a good idea to parse ctx into SetNextValue. We have another scenario to utilize this new API, we can more easily to dicide the gemm_k split value based on the convolution context parsed in. currently we iterate this value based on either next pow of 2, or next multiply of CU number, without the knowledge of current conv configs.

Indeed I think there should be no harm that we just add ctx to SetNextValue's argument. If some solver do not need the ctx value, then just ignore it.

cc @shaojiewang

jerryyin · 2021-07-01T13:59:51Z

@JehandadKhan Thank you, appreciate your time for the review.

@carlushuang Agreed, I'm sure knowing more context information can be beneficial. Now that since I have the majority votes on alternative 1, we should continue the SetNextValue(ctx) refactor in its original ticket.

As the next step, I'd do the following:

By next Friday on Junly 9th, if I don't have any objections, I will mark 1st alternative as the conclusion of this proposal.
We can continue to engage in Make ConvolutionContext accessible in SetNextValue() #388 and set corresponding urgency level if more developers agreed with SetNextValue refactor. I would not be an owner in that ticket as it is not the immediate priority for me.

jerryyin · 2021-07-23T18:26:39Z

Seeing no further feedbacks, I'm starting the implementation on tuning based on approach 1. I will start with the implementation of non-xdlops mlir solvers.

ppanchad-amd · 2024-03-20T20:35:48Z

@jerryyin, Is the implementation complete? If so, can you please close this ticket? Thanks!

atamazov · 2024-04-08T22:25:06Z

@ppanchad-amd This ticket is intended to document the current design (and its motivation), so it should never be closed. I recommend removing the urgency_high and value_high labels.

JehandadKhan · 2024-04-08T22:31:06Z

Perhaps we can move it to the wiki or documentation ?

…

On Mon, Apr 8, 2024 at 5:25 PM Artem Tamazov ***@***.***> wrote: @ppanchad-amd <https://github.com/ppanchad-amd> This ticket is intended to document the current design (and its motivation), so it should never be closed. I recommend removing the urgency_high <https://github.com/ROCm/MIOpen/labels/urgency_high> and value_high <https://github.com/ROCm/MIOpen/labels/value_high> labels. — Reply to this email directly, view it on GitHub <#1018 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABK5ZOL24T2CI34H36XDXTTY4MKNRAVCNFSM47RAEMIKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBUGM3TIMJUGE4Q> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ppanchad-amd · 2024-04-09T17:19:27Z

@JehandadKhan Created internal ticket to assist with the documentation. Thanks!

jerryyin added value_high request_for_comments See https://en.wikipedia.org/wiki/Request_for_Comments new feature umbrella_ticket labels Jun 29, 2021

jerryyin assigned whchung, jerryyin, JehandadKhan and atamazov Jun 29, 2021

jerryyin added the urgency_high label Jun 30, 2021

atamazov mentioned this issue Jul 13, 2021

[NFC] ConvolutionContext made accessible in SetNextValue() #1033

Merged

atamazov removed the umbrella_ticket label Sep 14, 2021

atamazov added enhancement and removed new feature labels Nov 24, 2021

ppanchad-amd added documentation Under Investigation and removed value_high urgency_high labels Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLIR Solvers Tuning Proposal #1018

MLIR Solvers Tuning Proposal #1018

jerryyin commented Jun 29, 2021 •

edited

Loading

atamazov commented Jun 29, 2021 •

edited

Loading

jerryyin commented Jun 30, 2021

JehandadKhan commented Jun 30, 2021

carlushuang commented Jul 1, 2021 •

edited

Loading

jerryyin commented Jul 1, 2021

jerryyin commented Jul 23, 2021 •

edited

Loading

ppanchad-amd commented Mar 20, 2024

atamazov commented Apr 8, 2024

JehandadKhan commented Apr 8, 2024 via email

ppanchad-amd commented Apr 9, 2024

MLIR Solvers Tuning Proposal #1018

MLIR Solvers Tuning Proposal #1018

Comments

jerryyin commented Jun 29, 2021 • edited Loading

Background and Motivation

Use Cases

Implementation

Tuning parameter passing mechanism

Performance Config

Alternative 1: Implement performance config in MIOpen

Alternative 2: Implement performance config in MLIR

1. One generic Performance Config information

2. ComputedIteratorGeneric

3. dynamically create a MLIR container

Side effects

MLIR Implementations

Summary

atamazov commented Jun 29, 2021 • edited Loading

jerryyin commented Jun 30, 2021

JehandadKhan commented Jun 30, 2021

carlushuang commented Jul 1, 2021 • edited Loading

jerryyin commented Jul 1, 2021

jerryyin commented Jul 23, 2021 • edited Loading

ppanchad-amd commented Mar 20, 2024

atamazov commented Apr 8, 2024

JehandadKhan commented Apr 8, 2024 via email

ppanchad-amd commented Apr 9, 2024

jerryyin commented Jun 29, 2021 •

edited

Loading

atamazov commented Jun 29, 2021 •

edited

Loading

carlushuang commented Jul 1, 2021 •

edited

Loading

jerryyin commented Jul 23, 2021 •

edited

Loading