-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MLIR Solvers Tuning Proposal #1018
Comments
Please see #388 and raise urgency level if necessary.
I am for it. |
Thanks for bring that up, didn't know that has been documented. This is going to be a high priority job for me because it's going to be the largest feature in the next milestone.
Niiiiiice. Thanks for the review. @JehandadKhan @whchung Any questions, concerns? |
@jerryyin We have consensus since Approach 1 makes the most sense to me as well. Thank you for formalizing the ideas. |
@jerryyin this is a good idea to parse Indeed I think there should be no harm that we just add cc @shaojiewang |
@JehandadKhan Thank you, appreciate your time for the review. @carlushuang Agreed, I'm sure knowing more context information can be beneficial. Now that since I have the majority votes on alternative 1, we should continue the As the next step, I'd do the following:
|
Seeing no further feedbacks, I'm starting the implementation on tuning based on approach 1. I will start with the implementation of non-xdlops mlir solvers. |
@jerryyin, Is the implementation complete? If so, can you please close this ticket? Thanks! |
@ppanchad-amd This ticket is intended to document the current design (and its motivation), so it should never be closed. I recommend removing the urgency_high and value_high labels. |
Perhaps we can move it to the wiki or documentation ?
…On Mon, Apr 8, 2024 at 5:25 PM Artem Tamazov ***@***.***> wrote:
@ppanchad-amd <https://github.com/ppanchad-amd> This ticket is intended
to document the current design (and its motivation), so it should never be
closed. I recommend removing the urgency_high
<https://github.com/ROCm/MIOpen/labels/urgency_high> and value_high
<https://github.com/ROCm/MIOpen/labels/value_high> labels.
—
Reply to this email directly, view it on GitHub
<#1018 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABK5ZOL24T2CI34H36XDXTTY4MKNRAVCNFSM47RAEMIKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBUGM3TIMJUGE4Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@JehandadKhan Created internal ticket to assist with the documentation. Thanks! |
Background and Motivation
Tuning has been known for providing optimal performance for a variety of algorithms. According to the study in #939, this continue to be true for cpp igemm solvers, the ones that MLIR solvers are prototyped from. To achieve the optimal performance towards important configs, we want to implement tuning for MLIR backend solvers in MIOpen. This involves three different areas:
AffixTuningParams
pass that can not only heuristic initialize, but can also initialize according to external passed in tuning parametersUse Cases
The implementation of tuning support two high level use cases:
Running mode
MIOpen query the perfdb for existing config,
Tuning mode
MIOpen repeatedly populated different tuning parameters. MLIR will populate kernel according to the tuning parameter passed in.
Implementation
Tuning parameter passing mechanism
Tuning parameters will be passed through comma split-ed C strings, in exact same format as the field stored in perfdb. The string can be either
Serialize()
orDeserialize()
, as inMIOpen::Serializable{}
. This way, either MIOpen or MLIR is able to communicate the tuning parameters with one additional string field. The benefit of using this mechanism are below:Performance Config
To enable tuning, MLIR solvers need to implement their own performance config data structures. The MLIR solver needs to implement 5 performance config functions, according to
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/120289fcb33496db05518b24e3a39db01d5adb5c/src/include/miopen/generic_search.hpp#L61-L73
Alternative 1: Implement performance config in MIOpen
All the performance config implementation can be done together with the MIOpen solvers. This follows the existing practices. The implication of this is: MIOpen will be MLIR's only tuning driver, since only MIOpen's performance config have the knowledge about how to tune a MLIR solver. This implication looks acceptable as we do intend to use only MIOpen/Tuna infrastructure for tuning of MLIR solvers. In any other scenarios, the MLIR infrastructure works strictly as a compiler/kernel generator.
In this case, all we need to do is to implement two performance config class:
Alternative 2: Implement performance config in MLIR
With alternative 1, one might have fear that the two components are coupled too closely together, because MIOpen's performance config has to have implementation detail of a MLIR solver. When a MLIR developer decides to change the way to tune its kernels, it has to make change in MIOpen.
A solution to this is to abstract the implementation details in the Miir.h interface, leaving MLIR component to be solely responsible for its tuning. It can be challenging because performance config does not dependent on any external information to make its decisions. Performance config are just data structures that encode the tuning parameters. Especially in
SetNextValue()
, it doesn't carry any context information at all. This makes it very hard to create a handle and let MLIR know any information that help to create the nextValue based on the current set of performance config.We need to make a combination of changes to make it possible for performance config tuning logic placed in MLIR
1. One generic Performance Config information
In this generic implementation, all MLIR solvers can share one implementation of performance config. Under the hood, the performance config will invoke
Miir
interface to decide on whether the config is valid/ how to set the next config.2. ComputedIteratorGeneric
The new
ComputedIteratorMLIRGeneric
will inherits from the original iterator. Based on whether the performance config is ofPerformanceMlirGeneric
, we have a different implementation ofSetNextValue()
that allow us to pass in the convolution context.3. dynamically create a MLIR container
Side effects
ComputedContainer
SetNextValue(ConvolutionContext)
interfaceMLIR Implementations
For alternative 1, the only changes needed on MLIR side are:
<params>
is empty, do its internal heuristic intializationFor alternative 2, in addition, it requires a couple of new APIs to implement the performance config interface:
Under the hood, the two interface function in MLIR side will invoke into the corresponding static functions. Depending whether or not this is a xdlops requrest, according to the handle, it will launch and return a new set of tuning parameters according to the one from the handle.
Summary
Considering we do plan on make MIOpen the only tuning driver for MLIR, I personally favor alternative 1 over 2 on a variety of factors:
The text was updated successfully, but these errors were encountered: