[Exhaustive Tuning] Search failed Error message when no solution is available #2253

junliume · 2023-07-11T17:08:15Z

This looks like a tiny issue which could be a good first one for new players of MIOpen:

MIOpen(HIP): Warning [GetAllConfigs] ConvHipImplicitGemmForwardV4R4Xdlops: Searching the best solution among 0 (spare)...
MIOpen(HIP): Warning [GenericSearch] Done: 0/0/0, best #0 3.40282e+38 4,4,1,4,4,1,0,0,1
MIOpen(HIP): Error [FindSolutionImpl] Search failed for: ConvHipImplicitGemmForwardV4R4Xdlops: /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/src/include/miopen/generic_search.hpp:555: Search failed

normally we would expect something like this:

MIOpen(HIP): Warning [GetAllConfigs] ConvAsm1x1U: Searching the best solution among 390...
MIOpen(HIP): Warning [Monitor] 102/0/390 0.0129282, best within recent 103: 0.0129282 #92 1,8,2,32,1,2,2,2, ETA:8.4816 sec.
MIOpen(HIP): Warning [Monitor] 207/0/390 0.0129282, best within recent 105: 0.01584 #122 1,8,2,16,1,2,2,2, ETA:5.30935 sec.
MIOpen(HIP): Warning [Monitor] 309/0/390 0.0129282, best within recent 102: 0.01552 #259 1,8,2,32,1,2,2,4, ETA:2.36496 sec.
MIOpen(HIP): Warning [GenericSearch] Done: 390/0/390, best #92 0.0129282 1,8,2,32,1,2,2,2
MIOpen(HIP): Warning [GenericSearch] ...Score: 4.02221 (default time 0.052)

A few issues:

In the first case, since 0 solution is applicable to this case, we should have an early break in the search instead of emitting an error.
In both cases, these messages should be printed as information rather than warning.

The text was updated successfully, but these errors were encountered:

dmikushin · 2023-07-11T20:13:31Z

Hi @junliume could you please give a command line to reproduce this?

junliume · 2023-07-12T00:33:40Z

@dmikushin it might be too heavy to reproduce with migraphx online tuning. I suggest doing static code check on the file: src/include/miopen/generic_search.hpp and the function GenericSearch. e.g. if n_runs_total=0, there is no reason to keep on the search.

DrizztDoUrden · 2023-07-12T11:07:25Z

@dmikushin One can make a fake tunable solver that is applicable in every net config and has 0 perf configs. That would be a good start for a unit test.

junliume · 2023-07-17T00:19:15Z

@dmikushin this issue has raised urgency level due to request to cherry pick into existing release. Please let me know if you have a plan to fix it. Thanks! CC: @JehandadKhan

dmikushin · 2023-07-17T08:50:27Z

Hi @junliume yes, I'm working on it.

dmikushin · 2023-07-18T09:51:47Z

@junliume , something like this?

As I understand the logic, we could still allow the function to execute till the end for simplicity, but do not fail it if there were no runs.

diff --git a/src/include/miopen/generic_search.hpp b/src/include/miopen/generic_search.hpp
index 5fb60235d..be6d89f58 100644
--- a/src/include/miopen/generic_search.hpp
+++ b/src/include/miopen/generic_search.hpp
@@ -212,7 +212,7 @@ public:
                 n_recent != 0u ? (static_cast<float>(n_total - n_recent) *
                                   (elapsed_cumulative / static_cast<float>(n_recent)) / 1000.0f)
                                : 0.0f; // paraniod
-            MIOPEN_LOG_W(n_recent << '/' << n_failed << '/' << n_total << ' ' << total_best
+            MIOPEN_LOG_I(n_recent << '/' << n_failed << '/' << n_total << ' ' << total_best
                                   << ", best within recent " << n_within_beat << ": " << best_time
                                   << " #" << n_best << ' ' << best_config << ", ETA:" << eta_sec
                                   << " sec.");
@@ -275,7 +275,7 @@ auto GetAllConfigs(const Solver s, const Context& context, const Problem& proble
 
     ComputedContainer<PerformanceConfig, Context, Problem> all_configs = useSpare ? spare : primary;
     const int n_runs_total = useSpare ? spare_size : primary_size;
-    MIOPEN_LOG_W(s.SolverDbId() << ": Searching the best solution among " << n_runs_total
+    MIOPEN_LOG_I(s.SolverDbId() << ": Searching the best solution among " << n_runs_total
                                 << (useSpare ? " (spare)" : "") << "...");
 
     return all_configs;
@@ -517,7 +517,7 @@ auto GenericSearch(const Solver s,
                 }
             }
 
-            // Banchmarked kernels will not be used anymore.
+            // Benchmarked kernels will not be used anymore.
             // Now we can delete Program objects that belong to OCL/HIP
             // runtime and free the associated resources (memory, file handles...)
             for(const auto& kernelInfo : current_solution.construction_params)
@@ -548,10 +548,10 @@ auto GenericSearch(const Solver s,
     for(auto& agent : compile_agents)
         agent.join();
 
-    MIOPEN_LOG_W("Done: " << n_runs_total << '/' << n_failed << '/' << n_runs_total << ", best #"
+    MIOPEN_LOG_I("Done: " << n_runs_total << '/' << n_failed << '/' << n_runs_total << ", best #"
                           << n_best << ' ' << best_time << ' ' << best_config);
 
-    if(!is_passed)
+    if(!is_passed && n_runs_total)
         MIOPEN_THROW("Search failed");
     // Run once with the default config and show score.
 
@@ -560,7 +560,7 @@ auto GenericSearch(const Solver s,
     invoker(profile_h, invoke_ctx);
     const auto default_time = profile_h.GetKernelTime();
     const auto score        = (best_time > 0.0f) ? default_time / best_time : 0.0f;
-    MIOPEN_LOG_W("...Score: " << score << " (default time " << default_time << ')');
+    MIOPEN_LOG_I("...Score: " << score << " (default time " << default_time << ')');
 
     return best_config;
 }

junliume · 2023-07-18T15:59:07Z

@JehandadKhan could you review the above proposal? It looks good to me. IMHO it's better to break early from the function execution but it also does not harm to run through the function without failure since these conditions appear rarely.

junliume · 2023-07-19T15:43:30Z

@dmikushin let's go with the proposal, please form a PR and I can help to push it through.

atamazov · 2023-07-20T07:31:41Z

@dmikushin One can make a fake tunable solver that is applicable in every net config and has 0 perf configs. That would be a good start for a unit test.

This solver would be incorrect.

atamazov · 2023-07-20T07:32:56Z

Please look at #2266 (comment)

atamazov · 2023-07-20T09:56:37Z

Some info about restrictions related to tunable solvers: #866 (comment)

atamazov · 2023-07-24T21:10:16Z

Not actually fixed yet. Continued in #2270...

junliume added value_middle quality urgency_normal labels Jul 11, 2023

junliume assigned CAHEK7, DrizztDoUrden, JehandadKhan, dmikushin and averinevg Jul 11, 2023

junliume added urgency_blocker and removed urgency_normal labels Jul 17, 2023

dmikushin mentioned this issue Jul 19, 2023

Do not fail the generic search if n_runs_total is zero; turns warnings into infos #2266

Merged

junliume closed this as completed Jul 24, 2023

atamazov mentioned this issue Jul 24, 2023

Issue with #2266 and fix within mainline #2270

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Exhaustive Tuning] Search failed Error message when no solution is available #2253

[Exhaustive Tuning] Search failed Error message when no solution is available #2253

junliume commented Jul 11, 2023

dmikushin commented Jul 11, 2023

junliume commented Jul 12, 2023

DrizztDoUrden commented Jul 12, 2023

junliume commented Jul 17, 2023

dmikushin commented Jul 17, 2023

dmikushin commented Jul 18, 2023 •

edited

Loading

junliume commented Jul 18, 2023

junliume commented Jul 19, 2023

atamazov commented Jul 20, 2023

atamazov commented Jul 20, 2023

atamazov commented Jul 20, 2023

atamazov commented Jul 24, 2023

[Exhaustive Tuning] Search failed Error message when no solution is available #2253

[Exhaustive Tuning] Search failed Error message when no solution is available #2253

Comments

junliume commented Jul 11, 2023

dmikushin commented Jul 11, 2023

junliume commented Jul 12, 2023

DrizztDoUrden commented Jul 12, 2023

junliume commented Jul 17, 2023

dmikushin commented Jul 17, 2023

dmikushin commented Jul 18, 2023 • edited Loading

junliume commented Jul 18, 2023

junliume commented Jul 19, 2023

atamazov commented Jul 20, 2023

atamazov commented Jul 20, 2023

atamazov commented Jul 20, 2023

atamazov commented Jul 24, 2023

dmikushin commented Jul 18, 2023 •

edited

Loading