Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Exhaustive Tuning] Search failed Error message when no solution is available #2253

Closed
junliume opened this issue Jul 11, 2023 · 12 comments
Closed

Comments

@junliume
Copy link
Contributor

This looks like a tiny issue which could be a good first one for new players of MIOpen:

MIOpen(HIP): Warning [GetAllConfigs] ConvHipImplicitGemmForwardV4R4Xdlops: Searching the best solution among 0 (spare)...
MIOpen(HIP): Warning [GenericSearch] Done: 0/0/0, best #0 3.40282e+38 4,4,1,4,4,1,0,0,1
MIOpen(HIP): Error [FindSolutionImpl] Search failed for: ConvHipImplicitGemmForwardV4R4Xdlops: /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/src/include/miopen/generic_search.hpp:555: Search failed

normally we would expect something like this:

MIOpen(HIP): Warning [GetAllConfigs] ConvAsm1x1U: Searching the best solution among 390...
MIOpen(HIP): Warning [Monitor] 102/0/390 0.0129282, best within recent 103: 0.0129282 #92 1,8,2,32,1,2,2,2, ETA:8.4816 sec.
MIOpen(HIP): Warning [Monitor] 207/0/390 0.0129282, best within recent 105: 0.01584 #122 1,8,2,16,1,2,2,2, ETA:5.30935 sec.
MIOpen(HIP): Warning [Monitor] 309/0/390 0.0129282, best within recent 102: 0.01552 #259 1,8,2,32,1,2,2,4, ETA:2.36496 sec.
MIOpen(HIP): Warning [GenericSearch] Done: 390/0/390, best #92 0.0129282 1,8,2,32,1,2,2,2
MIOpen(HIP): Warning [GenericSearch] ...Score: 4.02221 (default time 0.052)

A few issues:

  1. In the first case, since 0 solution is applicable to this case, we should have an early break in the search instead of emitting an error.
  2. In both cases, these messages should be printed as information rather than warning.
@dmikushin
Copy link
Contributor

Hi @junliume could you please give a command line to reproduce this?

@junliume
Copy link
Contributor Author

@dmikushin it might be too heavy to reproduce with migraphx online tuning. I suggest doing static code check on the file: src/include/miopen/generic_search.hpp and the function GenericSearch. e.g. if n_runs_total=0, there is no reason to keep on the search.

@DrizztDoUrden
Copy link
Contributor

@dmikushin One can make a fake tunable solver that is applicable in every net config and has 0 perf configs. That would be a good start for a unit test.

@junliume
Copy link
Contributor Author

@dmikushin this issue has raised urgency level due to request to cherry pick into existing release. Please let me know if you have a plan to fix it. Thanks! CC: @JehandadKhan

@dmikushin
Copy link
Contributor

Hi @junliume yes, I'm working on it.

@dmikushin
Copy link
Contributor

dmikushin commented Jul 18, 2023

@junliume , something like this?

As I understand the logic, we could still allow the function to execute till the end for simplicity, but do not fail it if there were no runs.

diff --git a/src/include/miopen/generic_search.hpp b/src/include/miopen/generic_search.hpp
index 5fb60235d..be6d89f58 100644
--- a/src/include/miopen/generic_search.hpp
+++ b/src/include/miopen/generic_search.hpp
@@ -212,7 +212,7 @@ public:
                 n_recent != 0u ? (static_cast<float>(n_total - n_recent) *
                                   (elapsed_cumulative / static_cast<float>(n_recent)) / 1000.0f)
                                : 0.0f; // paraniod
-            MIOPEN_LOG_W(n_recent << '/' << n_failed << '/' << n_total << ' ' << total_best
+            MIOPEN_LOG_I(n_recent << '/' << n_failed << '/' << n_total << ' ' << total_best
                                   << ", best within recent " << n_within_beat << ": " << best_time
                                   << " #" << n_best << ' ' << best_config << ", ETA:" << eta_sec
                                   << " sec.");
@@ -275,7 +275,7 @@ auto GetAllConfigs(const Solver s, const Context& context, const Problem& proble
 
     ComputedContainer<PerformanceConfig, Context, Problem> all_configs = useSpare ? spare : primary;
     const int n_runs_total = useSpare ? spare_size : primary_size;
-    MIOPEN_LOG_W(s.SolverDbId() << ": Searching the best solution among " << n_runs_total
+    MIOPEN_LOG_I(s.SolverDbId() << ": Searching the best solution among " << n_runs_total
                                 << (useSpare ? " (spare)" : "") << "...");
 
     return all_configs;
@@ -517,7 +517,7 @@ auto GenericSearch(const Solver s,
                 }
             }
 
-            // Banchmarked kernels will not be used anymore.
+            // Benchmarked kernels will not be used anymore.
             // Now we can delete Program objects that belong to OCL/HIP
             // runtime and free the associated resources (memory, file handles...)
             for(const auto& kernelInfo : current_solution.construction_params)
@@ -548,10 +548,10 @@ auto GenericSearch(const Solver s,
     for(auto& agent : compile_agents)
         agent.join();
 
-    MIOPEN_LOG_W("Done: " << n_runs_total << '/' << n_failed << '/' << n_runs_total << ", best #"
+    MIOPEN_LOG_I("Done: " << n_runs_total << '/' << n_failed << '/' << n_runs_total << ", best #"
                           << n_best << ' ' << best_time << ' ' << best_config);
 
-    if(!is_passed)
+    if(!is_passed && n_runs_total)
         MIOPEN_THROW("Search failed");
     // Run once with the default config and show score.
 
@@ -560,7 +560,7 @@ auto GenericSearch(const Solver s,
     invoker(profile_h, invoke_ctx);
     const auto default_time = profile_h.GetKernelTime();
     const auto score        = (best_time > 0.0f) ? default_time / best_time : 0.0f;
-    MIOPEN_LOG_W("...Score: " << score << " (default time " << default_time << ')');
+    MIOPEN_LOG_I("...Score: " << score << " (default time " << default_time << ')');
 
     return best_config;
 }

@junliume
Copy link
Contributor Author

@JehandadKhan could you review the above proposal? It looks good to me. IMHO it's better to break early from the function execution but it also does not harm to run through the function without failure since these conditions appear rarely.

@junliume
Copy link
Contributor Author

@dmikushin let's go with the proposal, please form a PR and I can help to push it through.

@atamazov
Copy link
Contributor

@dmikushin One can make a fake tunable solver that is applicable in every net config and has 0 perf configs. That would be a good start for a unit test.

This solver would be incorrect.

@atamazov
Copy link
Contributor

Please look at #2266 (comment)

@atamazov
Copy link
Contributor

Some info about restrictions related to tunable solvers: #866 (comment)

@atamazov
Copy link
Contributor

Not actually fixed yet. Continued in #2270...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants