Android example long prompt cache? #1077

scsonic · 2024-11-19T18:34:22Z

scsonic
Nov 19, 2024

I am currently using the Android Phi3 example.
If my system prompt is very long, it takes more than 90 seconds to process.
I’m wondering if there’s a way to cache the result of this 90-second processing.

When using llama.cpp for mobile,
it remembers the result, so next time, there’s no need to wait another 90 seconds.

or i can do something with classes in java api? import ai.onnxruntime.genai.[GeneratorParams, tokenizer, Model, Sequences]
is the KV cache in android working?

RyanUnderhill · 2024-11-22T23:39:19Z

RyanUnderhill
Nov 22, 2024
Maintainer

@aciddelgado 's latest PR adds a 'rewind' functionality so you can effectively cache the prompt by rewinding the generator back to the prompt position every iteration. It was just checked in, so it's not in a release yet.

0 replies

scsonic · 2024-11-26T15:01:47Z

scsonic
Nov 26, 2024
Author

its look like the function I need
I can backup context after long system prompt
and restore the context everytime with a new session
Is this understanding correct?
https://onnxruntime.ai/docs/execution-providers/QNN-ExecutionProvider.html#qnn-context-binary-cache-feature

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Android example long prompt cache? #1077

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Android example long prompt cache? #1077

scsonic Nov 19, 2024

Replies: 2 comments

RyanUnderhill Nov 22, 2024 Maintainer

scsonic Nov 26, 2024 Author

scsonic
Nov 19, 2024

RyanUnderhill
Nov 22, 2024
Maintainer

scsonic
Nov 26, 2024
Author