Fix the memory type of output tensor when warm-up is enabled #202

krishung5 · 2023-07-06T21:14:03Z

The issue was that the runtime inference would always return a CPU tensor when warm-up is enabled. This occurred because we recorded the output memory type using the first request. Since the warm-up is the first request to the model and always allocates CPU memory for the output, it incorrectly recorded the memory type for the output.
This PR fixes the logic so that the warm-up inference does not affect the actual memory type of the output tensor.

Testing: triton-inference-server/server#6033

krishung5 · 2023-07-07T19:45:22Z

Rebased.

krishung5 mentioned this pull request Jul 6, 2023

Add test to check the output memory type for onnx models triton-inference-server/server#6033

Merged

krishung5 requested review from Tabrizian and GuanLuo July 6, 2023 21:30

Tabrizian approved these changes Jul 7, 2023

View reviewed changes

GuanLuo approved these changes Jul 7, 2023

View reviewed changes

Query the memory type for each request

4f3d052

krishung5 force-pushed the krish-onnx branch from 44dcf60 to 4f3d052 Compare July 7, 2023 19:44

krishung5 merged commit eba4da5 into main Jul 7, 2023

krishung5 deleted the krish-onnx branch July 7, 2023 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the memory type of output tensor when warm-up is enabled #202

Fix the memory type of output tensor when warm-up is enabled #202

krishung5 commented Jul 6, 2023 •

edited

Loading

krishung5 commented Jul 7, 2023

Fix the memory type of output tensor when warm-up is enabled #202

Fix the memory type of output tensor when warm-up is enabled #202

Conversation

krishung5 commented Jul 6, 2023 • edited Loading

krishung5 commented Jul 7, 2023

krishung5 commented Jul 6, 2023 •

edited

Loading