Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast Hybrid mode of iGEMM #326

Merged
merged 16 commits into from
Aug 14, 2020
Merged

Fast Hybrid mode of iGEMM #326

merged 16 commits into from
Aug 14, 2020

Conversation

zjing14
Copy link
Contributor

@zjing14 zjing14 commented Jul 6, 2020

  • A implementation of issue Hybrid mode with iGEMM hit detection #299
  • Add the skip_solutions_that_take_long_time_to_build_and_have_narrow_coverage attribut into ExecutionContext
  • Add IsOptimizedHybrid() to set the attribute to true. IsHybrid() to return true for both plain Hybrid and OptimizedHybrid modes
  • Modify IsApplicable of all hip-based implicitGemm solvers. If the attribute is true, skip the solver
  • Set OptimizedHybrid mode as default

@atamazov
Copy link
Contributor

atamazov commented Jul 6, 2020

OBSOLETE INFO
The proposed design has a drawback: It does not affect the result of GWSS for the Solvers that implement the `GetWorkspaceSize()` method, so the library might request an excessive workspace from the user's program. ~- (2) The design is algorithm-based which does not comply to the [proposal](https://github.com//issues/299#issuecomment-645692843)~

Also I dislike the idea of lists of solvers for any purposes. Basically, it is the Solver who must know everything about its applicability under certain circumstances.

I have different design in mind.


Low-level design

  • (A) Add a new boolean attribute to the ExecutionContext: skip_solutions_that_take_long_time_to_build_and_have_narrow_coverage. This ugly name tells anything (and BTW denotes quirky architecture).
    • The new attribute is similar to the existing disable_perfdb_access or use_hip_kernels.
  • (B) Set this attribute to true when the new "Optimized Hybrid" Find mode is active.
  • (C) Each Solver that takes long time to build and has narrow coverage should be modified: its IsApplicable() should return false when skip_solutions_that_take_long_time_to_build_and_have_narrow_coverage is true. Other solvers do not require modifications.

An attentive readers would note that “my” design does not literally match my own architectural proposal. However, if they look more closely, they could see that this design is actually a linear transformation of the proposal.

@atamazov
Copy link
Contributor

atamazov commented Jul 6, 2020

My recent comment edited.

@zjing14
Copy link
Contributor Author

zjing14 commented Jul 6, 2020

@atamazov Thanks for your feedback. Supporting GetWorkspaceSize should be simple to implement. Looks like the main problem is using lists or not. Both solutions are good. The key is as @JehandadKhan said how we can have one location with all the information instead of checking IsApplicable of each solver. We can discuss it during the meeting and then make a confirmation.

@atamazov
Copy link
Contributor

atamazov commented Jul 6, 2020

@zjing14 You are welcome, But please don't let words to fool yourself. One location is Solver. List is the second location. If we are going to add new lists for each new small feature, then where we'll end up?

@zjing14
Copy link
Contributor Author

zjing14 commented Jul 7, 2020

@atamazov Just double-check if I understand your proposal and confirm with the implementation details.

  1. Add the skip_solutions_that_take_long_time_to_build_and_have_narrow_coverage attribut into SolverBase
  2. Add the "Optimized Hybrid" Find mode that goes through a list of to-disable solvers and set their attribute to true
  3. Modify the IsApplicable of corresponding solvers to check the attribute.

@atamazov
Copy link
Contributor

atamazov commented Jul 9, 2020

@zjing14 Good questions!

  1. Add the skip_solutions_that_take_long_time_to_build_and_have_narrow_coverage attribut into SolverBase

No, to ExecutionContext.

  1. Add the "Optimized Hybrid" Find mode...

Yes, the support for OptimizedHybrid should be added to the FindMode class. I recommend updating existing IsHybrid() to return true for both plain Hybrid and OptimizedHybrid modes and introducing a new function, IsOptimizedHybrid(). That would simplify code changes necessary in other places.

...that goes through a list of to-disable solvers and set their attribute to true

There is no need to create a new list. "OptimizedHybrid" should differ from plain "Hybrid" only in case of find-db miss: it sets skip_solutions_that_take_long_time_to_build_and_have_narrow_coverage in the context to true. This requires modifications in convolutionocl.cpp and in convolution.cpp. I recommend searching for IsHybrid in the sources to find places that require modifications. Examples:

Modified fragment of FindConvFwdAlgorithm()
(Comments removed to save space)
const miopen::FindMode fm;
bool use_immediate_solution = false;
miopenConvSolution_t sol;
if((fm.IsFast() || fm.IsHybrid()) && !use_winograd_only)
{
    size_t count;
    GetForwardSolutions(handle, wDesc, xDesc, yDesc, 1, &count, &sol);
    use_immediate_solution = (count > 0) && !(fm.IsHybrid() && sol.time < 0);
}

if(use_immediate_solution)
{
    CompileForwardSolution(handle, wDesc, xDesc, yDesc, sol.solution_id);
    const auto id = solver::Id(sol.solution_id);
    perf_db.push_back(
        {id.GetAlgo(conv::Direction::Forward), id.ToString(), sol.time, sol.workspace_size});
}
else
{
    ctx.skip_solutions_that_take_long_time_to_build_and_have_narrow_coverage
        = fm.IsOptimizedHybrid(); /// <<<------------------------------------------ ADDED STATEMENT
    perf_db = UserFindDbRecord::TryLoad(handle, problem, [&](DbRecord& record) {
        DirConvFindCore(handle,
                        xDesc,
                        x,
                        wDesc,
                        w,
                        yDesc,
                        y,
                        workSpace,
                        workSpaceSize,
                        *this,
                        exhaustiveSearch,
                        record,
                        ctx, /// <<<---------------------- PASSING CONTEXT TO SearchForAllSolutions
                        use_winograd_only);
    });
}

Modified fragment of ForwardGetWorkSpaceSize()
(Comments removed to save space)
auto ctx = ConvolutionContext{xDesc, wDesc, yDesc, *this, conv::Direction::Forward};
...
const miopen::FindMode fm;
while(fm.IsFast() || fm.IsHybrid())
{
    size_t count;
    miopenConvSolution_t sol;
    GetForwardSolutions(handle, wDesc, xDesc, yDesc, 1, &count, &sol);
    if(count < 1 || (fm.IsHybrid() && sol.time < 0))
    {
        ctx.skip_solutions_that_take_long_time_to_build_and_have_narrow_coverage
            = fm.IsOptimizedHybrid(); /// <<<-------------------------------------- ADDED STATEMENT
        break; // Fall down to Normal Find.
    }
    MIOPEN_LOG_I2(sol.workspace_size);
    return sol.workspace_size;
}

ctx.SetupFloats();
ctx.do_search             = false;
ctx.disable_perfdb_access = true;
const size_t direct_workspace = ForwardBackwardDataGetWorkSpaceSizeDirect(ctx);
const size_t implicit_gemm_workspace = ForwardBackwardGetWorkSpaceSizeImplicitGemm(ctx);
const size_t workspace_size_scgemm = ForwardBackwardDataGetWorkSpaceSizeSCGemm(handle, ctx);
...

As you can see, only two lines of code added.

Unfortunately, some of the required modifications may involve more complex changes, because there are still some irregularities in the code (not all Solvers/Invokers are ready yet (((

  1. Modify the IsApplicable of corresponding solvers to check the attribute.

Yes. Each Solver that “knows” about itself that it has narrow coverage and its build takes a lot of time should check the new attribute and tell the caller "I am not applicable".

Hope this helps.

@pfultz2
Copy link
Contributor

pfultz2 commented Jul 9, 2020

It does seem a little hackish to use context variable to traverse a different list

@atamazov
Copy link
Contributor

atamazov commented Jul 9, 2020

@pfultz2

It does seem a little hackish to use context variable to traverse a different list

Which list do you mean?

@zjing14
Copy link
Contributor Author

zjing14 commented Jul 10, 2020

@atamazov Thanks very much for the clarification. Now, I think I understand. Will finish a modification soon.

@zjing14 zjing14 force-pushed the igemm_hybrid_mode branch from 5f78ec3 to e49eb08 Compare July 11, 2020 02:25
@zjing14
Copy link
Contributor Author

zjing14 commented Jul 11, 2020

Hi all, I modified the code following @atamazov comments. Please take a look. If the implementation is OK, I will set the optimized_hybrid to default.

@zjing14 zjing14 changed the title [Prototype][WIP] Hybrid mode of iGEMM [WIP] Hybrid mode of iGEMM Jul 11, 2020
@@ -447,7 +447,11 @@ std::size_t ConvolutionDescriptor::ForwardGetWorkSpaceSize(Handle& handle,
miopenConvSolution_t sol;
GetForwardSolutions(handle, wDesc, xDesc, yDesc, 1, &count, &sol);
if(count < 1 || (fm.IsHybrid() && sol.time < 0))
{
ctx.skip_solutions_that_take_long_time_to_build_and_have_narrow_coverage =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please shorten this variable name and preferably make this an int to indicate cost estimate if @atamazov agrees. If not at least rename it to something simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cost estimate (or a build time?) is a good idea that allows for more flexibility. But we need also indication of coverage to make a decision (skip or use). Also I am not sure that we need this flexibility right now. Let's keep this idea in mind for the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT naming, see #326 (comment):

...skip_solutions_that_take_long_time_to_build_and_have_narrow_coverage. This ugly name tells anything (and BTW denotes quirky architecture).

@@ -194,6 +194,7 @@ const char* ToCString(const FindMode::Values mode)
case FindMode::Values::Normal: return "NORMAL";
case FindMode::Values::Fast: return "FAST";
case FindMode::Values::Hybrid: return "HYBRID";
case FindMode::Values::OptimizedHybrid: return "OPTIMIZED_HYBRID";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add a new mode, this looks like a heuristic which should just be part of the "Hybrid" mode. @atamazov comments ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This heuristic has negative side effects and looks like a patch from the architectural POV. That is why I would like to keep "plain" Hybrid.

src/ocl/convolutionocl.cpp Outdated Show resolved Hide resolved
src/ocl/convolutionocl.cpp Outdated Show resolved Hide resolved
src/ocl/convolutionocl.cpp Show resolved Hide resolved
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update documentation in doc/src/find_and_immediate.md.

@zjing14 zjing14 changed the title Optimized Hybrid mode of iGEMM Fast Hybrid mode of iGEMM Jul 22, 2020
@zjing14 zjing14 requested review from atamazov and pfultz2 July 23, 2020 02:14
@zjing14
Copy link
Contributor Author

zjing14 commented Jul 24, 2020

Here are updated numbers of ReNext101, bs=128 with tuning (MIOPEN_FIND_ENFORCE=3). @daniellowell @atamazov @JehandadKhan

  Wall time (sec) Throughput (img/sec)
Normal 379 153
Hybrid 214 153
Fast_Hybrid 217 154

@daniellowell
Copy link
Contributor

Adding onHold until we merge in #272 and #317

@zjing14
Copy link
Contributor Author

zjing14 commented Jul 31, 2020

Numbers of resnext101, bs128 with different FIND_MODE on Vega20 (MI60).

  Wall time Throughput (img/sec)
Normal 7m32.903s 77.25
Hybrid 2m37.938s 77.08
Fast_Hybrid 2m37.859s 77.12

@TejashShah
Copy link
Contributor

@zjing14 Merge conflict with develop.

@JehandadKhan JehandadKhan dismissed their stale review August 4, 2020 18:19

Review has gone stale

@zjing14
Copy link
Contributor Author

zjing14 commented Aug 6, 2020

Numbers of resnext101, bs128, fp32 for different MIOPEN_FIND_MODE on Vega20 (MI60) with empty find-db.

Find mode Wall time Throughput (img/sec)
Normal 7m37.145s 78.9
Fast 1m54.875s 35.9
Hybrid 7m38.975s 78.7
Fast_Hybrid 4m21.852s 78.0

@dagamayank
Copy link
Contributor

@pfultz2 @atamazov @daniellowell @JehandadKhan Pl review.

@TejashShah
Copy link
Contributor

@zjing14 Just retrigger the build.

@pfultz2 @atamazov @daniellowell @JehandadKhan Pl review.

Needs review

Copy link
Contributor

@JehandadKhan JehandadKhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitated to approve this PR since I prefer the previous approach used by @zjing14 where we setup a container of solvers to which we fell back.

  • That approach makes it easier to just look at the code and tell which solvers would be executed as a fall back. The implemented approach requires log collection per config to determine which configs would be using the hybrid approach.
  • A new variable is not required to be added to the context. The context variable based approach makes it necessary that each solver is updated ( which this PR does) and makes it difficult to assess the impact of a change and burden all future solvers to setup the variable properly.
  • Finally, the name of the newly added variable is almost the width of a line. Which is bad practice in my opinion.

I am approving this PR since this is an important issue, perhaps we can rework this later and update the overall mechanism.

@atamazov
Copy link
Contributor

@JehandadKhan Thanks for your opinion (although I have the opposite one).

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants