You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I execute a Local GPT action, Ollama frequently performs unnecessary model reloads before processing the action. This occurs even at intervals shorter than the model's unload timeout.
This issue occurs with all models regardless of type or size, although it seems to happen more frequently with larger models. In terms of impact, with a 2B model, the reload time is short enough to be tolerable, but with a 32B model, the waiting time before generation becomes significant and cannot be ignored.
Has anyone else encountered this issue?
The text was updated successfully, but these errors were encountered:
I apologize for not being clear. I wasn't referring to any specific operation - this is something I noticed during normal usage, which is why I wanted to ask about it.
Since I'm not sure what information would be helpful, I've recorded a video showing my usage pattern. I'm using the gemma2:9b-instruct-q4_K_M model in the video, though this behavior occurs with all models regardless of which one I use.
When executing an action, sometimes it starts generating immediately, while other times there's about a 6-second delay. During these delays, I notice the dedicated GPU memory temporarily decreases before increasing again. This appears to be model reloading.
With the 9B model, it fits in the cache so there's no disk access, but when I use a 27B model in the same situation, disk access occurs, which led me to conclude it's reloading the model. (this is what I showed in the screenshot in my original post)
I am using Ollama on Windows 11.
When I execute a Local GPT action, Ollama frequently performs unnecessary model reloads before processing the action. This occurs even at intervals shorter than the model's unload timeout.
This issue occurs with all models regardless of type or size, although it seems to happen more frequently with larger models. In terms of impact, with a 2B model, the reload time is short enough to be tolerable, but with a 32B model, the waiting time before generation becomes significant and cannot be ignored.
Has anyone else encountered this issue?
The text was updated successfully, but these errors were encountered: