Add BackendAttribute for parallel model instance loading #235

rmccorm4 · 2023-07-21T21:46:56Z

Current default value of the attribute is false, so no parallel loading will be done unless a backend implements and calls the API to enable it. This is just the scaffolding for such capabilities.

Identity Backend opt-in for sanity check in existing pipelines: triton-inference-server/identity_backend#26

Seeing ballpark 2-3x speedup on my machine with Identity Backend loading 100 instances. I think the speedup will be more meaningful for model instances that take longer to initialize in more complicated backends. Will do more thorough performance analysis after enabling support in other backends.

…ing API to opt-in

…lly or concurrently

src/backend_model_instance.cc

…alization

Tabrizian

The API LGTM.

include/triton/core/tritonbackend.h

src/backend_model.cc

…operator to avoid interweaving when multi-threaded

Tabrizian

LGTM.

rmccorm4 added 4 commits July 20, 2023 15:37

Add initial parallel instance loading BackendAttribute and correspond…

bcb2ab2

…ing API to opt-in

Add backend attribute check to decide whether to load instances seria…

5c61fe7

…lly or concurrently

Add backend attribute check to decide whether to load instances seria…

5e7f977

…lly or concurrently

Fix merge conflict

b47f0c7

rmccorm4 requested review from Tabrizian and tanmayv25 July 21, 2023 21:50

rmccorm4 commented Jul 21, 2023

View reviewed changes

src/backend_model_instance.cc Outdated Show resolved Hide resolved

rmccorm4 requested a review from GuanLuo July 21, 2023 21:59

rmccorm4 added 2 commits July 21, 2023 16:30

Protect payload_queue->specific_queues from concurrent instance initi…

6ba6308

…alization

formatting

3e7e12a

Tabrizian reviewed Jul 24, 2023

View reviewed changes

include/triton/core/tritonbackend.h Outdated Show resolved Hide resolved

src/backend_model.cc Show resolved Hide resolved

GuanLuo reviewed Jul 24, 2023

View reviewed changes

src/backend_model.cc Show resolved Hide resolved

rmccorm4 mentioned this pull request Jul 24, 2023

Support parallel instance loading triton-inference-server/identity_backend#26

Merged

rmccorm4 added 2 commits July 24, 2023 17:00

Remove logging from critical section, and keep it to a single stream …

e708f20

…operator to avoid interweaving when multi-threaded

Review feedback: Rename backend API

612cced

rmccorm4 marked this pull request as ready for review July 25, 2023 00:06

Remove timing

844383e

rmccorm4 requested review from Tabrizian and GuanLuo July 25, 2023 19:00

Tabrizian approved these changes Jul 26, 2023

View reviewed changes

rmccorm4 merged commit 9714cd6 into main Jul 26, 2023

rmccorm4 deleted the rmccormick-optin branch July 26, 2023 19:27

rmccorm4 mentioned this pull request Jul 26, 2023

Adding the support tracing of child models invoked from a BLS model #234

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BackendAttribute for parallel model instance loading #235

Add BackendAttribute for parallel model instance loading #235

rmccorm4 commented Jul 21, 2023 •

edited

Loading

Tabrizian left a comment

Tabrizian left a comment

Add BackendAttribute for parallel model instance loading #235

Add BackendAttribute for parallel model instance loading #235

Conversation

rmccorm4 commented Jul 21, 2023 • edited Loading

Tabrizian left a comment

Choose a reason for hiding this comment

Tabrizian left a comment

Choose a reason for hiding this comment

rmccorm4 commented Jul 21, 2023 •

edited

Loading