-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid rate limiter resource count update from iterating over all instances #208
Conversation
src/rate_limiter.cc
Outdated
resource_manager_->AddModelInstance(pair_it.first->second.get()); | ||
const auto& status = resource_manager_->UpdateResourceLimits(); | ||
const auto& status = | ||
resource_manager_->AddModelInstance(pair_it.first->second.get()); | ||
if (!status.IsOk()) { | ||
resource_manager_->RemoveModelInstance(pair_it.first->second.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kthui it looks like we throw away the status returned by RemoveModelInstance
, which in turn calls several other methods that can return detailed status errors.
Can we propagate the error statuses back up via LOG_ERROR
in the places where we call this method? It looks like we call it in a few places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Updated: Minor fix and return error on instance remove failure
64f80d6
to
6460553
Compare
88f7622
to
8fa3e9a
Compare
afb3c1b
to
b12ea16
Compare
b12ea16
to
e664d04
Compare
This PR optimize the logic on rate limiter instance max resource update, after an instance is added or removed. The previous logic will iterate over all instances once for each instance added or removed. For adding an instance, the new logic will only compare the resource limit between the new instance and the max resource, and update the max resource if the new instance has a higher limit. For removing an instance, the resource on the instance being removed is compared with the max resource. If the max resource is greater than the resource on the instance being removed, then no update will be performed. If the removed instance has higher or equal limit to the max resource, then the max resource is discarded and re-computed, which iterates over the instances once.
Server PR: triton-inference-server/server#5885