Skip to content

Commit

Permalink
Add docs on instance count decrease
Browse files Browse the repository at this point in the history
  • Loading branch information
kthui committed Jul 19, 2023
1 parent f36030e commit 422da5b
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions docs/user_guide/model_management.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,12 @@ model rather then reloading it, when either a load request is received under
configuration, so its presence in the model directory may be detected as a new file
and cause the model to fully reload when only an update is expected.

* If a sequence model is *updated* (i.e. decreasing the instance count), Triton
will wait until the in-flight sequence is completed (or timed-out) before the
instance behind the sequence is removed.
* If the instance count is decreased, arbitrary instance(s) are selected among
idle instances and instances with in-flight sequence(s) for removal.

* If a sequence model is *reloaded* with in-flight sequence(s) (i.e. changes to
the model file), Triton does not guarantee any remaining request(s) from the
in-flight sequence(s) will be routed to the same model instance for processing.
Expand Down

0 comments on commit 422da5b

Please sign in to comment.