EP context cache feature design #22142

HectorSVC · 2024-09-19T05:39:48Z

Description

EP context cache feature design

docs/execution-providers/EP-Context-Design.md

jywu-msft · 2024-09-24T15:38:44Z

mrsabhar · 2024-09-25T22:54:51Z

@HectorSVC Thanks for coming up with concept to improve first ever inference latency as this is requested by several SW vendors (even for non-GenAI). I’m aware that the feature design is closed, but how does it address the scenario where a developer ships an encrypted ONNX file and the context file is obfuscated or encrypted? Do we expect double processing/memory usage for decrypting the ONNX model to determine if a context file is available, and then again reading the context file into memory (which can’t be in human-readable form due to IP leakage) to check its availability? Only time we know context cache is available for ONNX model is in memory. Also how context file takes into account dynamic shapes ?

"If the user loads the model from memory buffer, user needs to provide session option ep.context_file_path. EP gets the folder path from ep.context_file_path, and combines it with the relative path got from step a) as the context binary file full path."

HectorSVC added 3 commits September 18, 2024 22:38

EP context cache feature design

44425a0

update table

5e675b5

update

24c9e49

jywu-msft reviewed Sep 19, 2024

View reviewed changes

docs/execution-providers/EP-Context-Design.md Outdated Show resolved Hide resolved

update according review comments

de280bf

jywu-msft approved these changes Sep 24, 2024

View reviewed changes

HectorSVC merged commit 9b71042 into gh-pages Sep 24, 2024
5 checks passed

HectorSVC deleted the ep_context_doc branch September 24, 2024 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EP context cache feature design #22142

EP context cache feature design #22142

HectorSVC commented Sep 19, 2024

jywu-msft commented Sep 24, 2024

mrsabhar commented Sep 25, 2024

EP context cache feature design #22142

EP context cache feature design #22142

Conversation

HectorSVC commented Sep 19, 2024

Description

jywu-msft commented Sep 24, 2024

mrsabhar commented Sep 25, 2024