Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EP context cache feature design #22142

Merged
merged 4 commits into from
Sep 24, 2024
Merged

EP context cache feature design #22142

merged 4 commits into from
Sep 24, 2024

Conversation

HectorSVC
Copy link
Contributor

Description

EP context cache feature design

@jywu-msft
Copy link
Member

+@chilo-ms fyi

@HectorSVC HectorSVC merged commit 9b71042 into gh-pages Sep 24, 2024
5 checks passed
@HectorSVC HectorSVC deleted the ep_context_doc branch September 24, 2024 16:22
@mrsabhar
Copy link

@HectorSVC Thanks for coming up with concept to improve first ever inference latency as this is requested by several SW vendors (even for non-GenAI). I’m aware that the feature design is closed, but how does it address the scenario where a developer ships an encrypted ONNX file and the context file is obfuscated or encrypted? Do we expect double processing/memory usage for decrypting the ONNX model to determine if a context file is available, and then again reading the context file into memory (which can’t be in human-readable form due to IP leakage) to check its availability? Only time we know context cache is available for ONNX model is in memory. Also how context file takes into account dynamic shapes ?

"If the user loads the model from memory buffer, user needs to provide session option ep.context_file_path. EP gets the folder path from ep.context_file_path, and combines it with the relative path got from step a) as the context binary file full path."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants