Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit Tuning by Time #1997

Merged
merged 13 commits into from
Mar 14, 2023
Merged

Limit Tuning by Time #1997

merged 13 commits into from
Mar 14, 2023

Conversation

JehandadKhan
Copy link
Contributor

@JehandadKhan JehandadKhan commented Feb 22, 2023

This PR makes updates to the search algorithm in MIOpen:

  • The search sequence is randomized. This helps in increasing the chances of finding an optima even when a time budget is set.
  • An environment variable (MIOPEN_TUNING_TIME_MS_MAX) which sets the number of milliseconds MIOpen can spend to tune each solver.
  • Updates to selection of multi-threading level based on the compilation back-end in light of collected data.
  • Add a multi threaded queue which enables multiple threads to write to a queue and the main thread to consume work from it. This struct is accompanied with a unit test to go along with it.
  • Add an MIOpen env variable to override tuning parameters for solvers. (MIOPEN_DEBUG_PERFDB_OVERRIDE)

@JehandadKhan JehandadKhan added this to the ROCm 5.6 milestone Feb 22, 2023
@JehandadKhan
Copy link
Contributor Author

@atamazov As discussed, here is the PR which limits tuning by time.


std::size_t GetTuningIterationsMax()
{
return Value(MIOPEN_DEBUG_TUNING_ITERATIONS_MAX{}, std::numeric_limits<std::size_t>::max());
}

std::chrono::milliseconds GetTuningTimeMax()
{
const auto fallback =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be static as well? It's not used in non-static context. Or you could wrap the calculation of res value in a lambda.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

{
std::unique_lock<std::mutex> lock(mutex);
cond_var.wait(lock, [&] { return !queue.empty(); });
return queue.front();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't reference get messed up on push? This should probably return by value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since its a queue, the push would happen at the other end of the underlying container. The object has only one consumer so peeking at the front with a reference saves the copy which a return by value would entail. Therefore, the semantics are to get a reference to the front, use the object and once you are done, you pop it off the queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach inherently relies on the implementation of std::queue not reallocating its internal container and invalidating the reference. And it probably can. Also it may get changed in the future. Thus even if safe right now this is a liability. I still suggest returning by value. Combining this with pop in a single method would remove one mutex lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I am talking about: https://stackoverflow.com/a/16075550

@averinevg
Copy link
Contributor

  • The search sequence is randomized. This helps in increasing the chances of finding an optima even when a time budget is set.

@JehandadKhan How could randomization increase the chances of finding an optima?

@JehandadKhan
Copy link
Contributor Author

@JehandadKhan How could randomization increase the chances of finding an optima?

@averinevg When there is a limited time budget, then randomizing the search space is required. If the search space is traversed in order ( as is currently the case) then a time budget would limit the parts of space searched.

@averinevg
Copy link
Contributor

averinevg commented Feb 23, 2023

@averinevg When there is a limited time budget, then randomizing the search space is required. If the search space is traversed in order ( as is currently the case) then a time budget would limit the parts of space searched.

@JehandadKhan Let's imagine some list of numbers in an unknown order. Will the numbers with the highest values end up at the top of this list after randomization?

Randomization would make it possible to search among some elements from a limited part of the space, but this would not increase the probability of finding the optimal solution. I mean there is no correlation here.

Discussed with @atamazov. At this stage I have no objection about randomization.

src/include/miopen/generic_search.hpp Outdated Show resolved Hide resolved
src/generic_search.cpp Outdated Show resolved Hide resolved
src/generic_search.cpp Outdated Show resolved Hide resolved
src/generic_search.cpp Show resolved Hide resolved
src/include/miopen/generic_search.hpp Outdated Show resolved Hide resolved
src/include/miopen/generic_search.hpp Show resolved Hide resolved
src/generic_search.cpp Outdated Show resolved Hide resolved
src/include/miopen/generic_search.hpp Outdated Show resolved Hide resolved
src/include/miopen/generic_search.hpp Outdated Show resolved Hide resolved
{
std::unique_lock<std::mutex> lock(mutex);
cond_var.wait(lock, [&] { return !queue.empty(); });
return queue.front();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach inherently relies on the implementation of std::queue not reallocating its internal container and invalidating the reference. And it probably can. Also it may get changed in the future. Thus even if safe right now this is a liability. I still suggest returning by value. Combining this with pop in a single method would remove one mutex lock.

@JehandadKhan
Copy link
Contributor Author

JehandadKhan commented Mar 9, 2023

@DrizztDoUrden and @averinevg I have addressed your reviews.

DrizztDoUrden
DrizztDoUrden previously approved these changes Mar 9, 2023
Copy link
Contributor

@DrizztDoUrden DrizztDoUrden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add one change, but this is safe to merge right now.

{
std::unique_lock<std::mutex> lock(mutex);
cond_var.wait(lock, [&] { return !queue.empty(); });
T ret = queue.front();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
T ret = queue.front();
T ret = std::move(queue.front());

Wouldn't save much time (relatively speaking) in generic search, but who knows where else this may get used.

averinevg
averinevg previously approved these changes Mar 9, 2023
@junliume
Copy link
Collaborator

@averinevg could you please resolve the conflict? It is caused by merging #2009 first.

@averinevg averinevg dismissed stale reviews from DrizztDoUrden and themself via 3577101 March 13, 2023 07:19
@averinevg
Copy link
Contributor

@averinevg could you please resolve the conflict? It is caused by merging #2009 first.

@junliume Merge conflict has been resolved

@junliume junliume merged commit b4e0a67 into develop Mar 14, 2023
@junliume junliume deleted the jd/tuning_override branch April 17, 2023 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants