-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] ollama models context size not properly imported/reflected #309
Comments
Thanks @XReyRobert . Unfortunately Ollama does not usually provide the context size, so it's assumed to be 4k across the board. The /models API does not provide it, and the models list did not. In your particular case, the name of the model has the context size, but that's a rarity. What's the best way to deal with this, or to get context sizes for all models? |
Hi @enricoros, There's a "show" endpoint that gives additional parameters when available:
|
I confirm the bug. Also, for what it's worth, this Ollama release changelog specifies how to pass a 32k context window to Mixtral (and I suppose other models as well). https://github.com/jmorganca/ollama/releases/tag/v0.1.19 |
Thanks! I'll prioritize this issue. I can quickly fix it as far as knowing the context size. For the "32k Mixtral" the weird part is that it should not be the developer to tell the API what the context window is, but the other way around. Commonly, APIs usually pass a "max_tokens" parameter as a hard limit to the response length - I'm sure the Ollama folks will make the API more standard. Their recent /chat endpoint shows that they're on a good path. Prioritized. |
@XReyRobert implemented, releasing in 3 hours in |
Note that from testing, only yarn-mistral has a number set that's not 4096, while some models don't have parameters, don't have a 'num_ctx' value to parse within, or have it set to 4096.
Describe the bug
ollama models context size not properly imported/reflected
Where is it happening?
To Reproduce
import 128K ollama model (ex Yarn-mistral 7b-128k) show model details / max model tokens in UI
Expected behavior
Screenshots / context
If applicable, please add screenshots or additional context
The text was updated successfully, but these errors were encountered: