-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
epic: add cache_prompt to Engines Settings #1570
Labels
type: epic
A major feature or initiative
Comments
@imtuyethan - For the specs. |
This was referenced Aug 28, 2024
related: janhq/jan#3140 |
Latest related request: janhq/jan#3715 |
25 tasks
I am transferring this to Cortex, as part of our Llama.cpp integration settings |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Jan does not support setting cache_prompt in the HTTP request JSON for llama.cpp - resulting in slower processing times for long contexts (8000+ tokens).
Describe the solution
Ja should support setting the cache_prompt parameter in the HTTP request JSON to enable faster processing times with llama.cpp.
By using Anthropic prompt caching, longer chats with a lot of context are substantially cheaper:
https://www.anthropic.com/news/prompt-caching
What is the motivation / use case for changing the behavior?
Currently, the default setting for cache_prompt is off in llama.cpp - leading to significant delays. Manually enabling cache_prompt improves performance.
The text was updated successfully, but these errors were encountered: