-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: remove max context length size (currently defaults to 4096) #3796
Comments
Model contexts are getting larger now. We should remove the max context size constraint in the UI |
Hi @0xSage, It is not limited by the UI. It is retrieved and restricted by the model. As you can observe, it can extend to 128K; however, the issue arises when the model cannot be loaded with such a large context size, as there may be an OOM problem. @thonore75 Could you kindly assist by sharing the log file here so we can see what is the problem?. Also, your device specs. If that is the case, would be great to have
|
ScreenHunter.52.mp4app - CPU.log Device specs:
Jan is installed (models too) on a Samsung SSD 980 PRO 2TB When doing my tests, I forgot to close MySql Workbench, it's using GPUs, and the context length limit was 4096 with correct answer. When close, I was able to reach 64000 context length limit. I also tried in disabling the hardware acceleration, so in CPU mode, the model is always loaded, anyway the context length used, but the answer is only correct with 4096, above the answer is just a point "." The more strange, if I used 64000 for context length, it was working, I changed to 96000, it was failing, but when changed it again in 64000, it was not working anymore. I closed Jan and restarted it, and it was working! |
Hi @thonore75 It's due to VRAM OOM. It couldn't allocate resources on your remaining VRAM. The model size and context length should match your remaining VRAM, resulting in varying outcomes across stages, especially when other applications are also utilizing the GPU.
|
Hi @louis-jan, |
Hi @louis-jan, ScreenHunter.53.mp4 |
Oh wow, thanks @thonore75. I didn't know that |
The response seems like Prompt Template issue too @louis-jan |
Cannot reproduce this bug on my side (using Llama 3.2 1B Instruct Q8), maybe it's because of the model @thonore75 use? Screen.Recording.2024-10-17.at.7.00.25.PM.mov |
Hi @imtuyethan, Here the link where I downloaded the model : https://huggingface.co/maxwellb-hf/granite-8b-code-instruct-128k-Q5_K_M-GGUF I will try the same model but with different quantification to see if there is a difference. |
When I reached the context length limit, if I set a correct value, where it was working before, after the model restarted, the answer is always a point ".". All time I will still on the same Thread, the issue will occurs. If I start a new Thread, the model is reloaded and the answer is correct. I little workaround to avoid to restart Jan each time the issue occurs |
Thanks @thonore75. I can reproduce the part where model replies nonsense when context length is max: Screen.Recording.2024-10-18.at.2.12.05.PM.movHowever for this part i cannot reproduce, it stills works when i set the context length back to 4096.
Screen.Recording.2024-10-18.at.2.13.01.PM.movHigh chances are this is not a bug from Jan but the model itself, the default prompt template is definitely not correct. And user's device can't handle large context length which led to a nonsense response. Need @louis-jan to investigate further. |
ScreenHunter.57.mp4 |
But it was working with the model "Phi-3-medium-128k-instruct-Q5_0" |
Hey @thonore75 from the logs, it seems like OOM issue (out of memory):
I recommend:
This isn't exactly a bug - it's more of a hardware limitation where the initial requested configuration exceeds available GPU memory, but the system successfully recovers by falling back to a more conservative configuration. ![]() |
However, on the other side, when you import your own models to use:
It looks like the model you used in the video has wrong prompt template. |
Hi @imtuyethan , I understand the hardware limitation, but the message could be more explicit. |
Jan version
0.5.6
Describe the Bug
When using a model with 128k context, it's not possible to increase the context size.
Steps to Reproduce
Screenshots / Logs
ScreenHunter.49.mp4
What is your OS?
The text was updated successfully, but these errors were encountered: