Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent model reloading on Ollama #54

Closed
7shi opened this issue Nov 27, 2024 · 3 comments
Closed

Frequent model reloading on Ollama #54

7shi opened this issue Nov 27, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@7shi
Copy link

7shi commented Nov 27, 2024

I am using Ollama on Windows 11.

When I execute a Local GPT action, Ollama frequently performs unnecessary model reloads before processing the action. This occurs even at intervals shorter than the model's unload timeout.

This issue occurs with all models regardless of type or size, although it seems to happen more frequently with larger models. In terms of impact, with a 2B model, the reload time is short enough to be tolerable, but with a 32B model, the waiting time before generation becomes significant and cannot be ignored.

Has anyone else encountered this issue?

image

@pfrankov
Copy link
Owner

pfrankov commented Dec 1, 2024

I'm afraid that your description gives me nothing to investigate.

@7shi
Copy link
Author

7shi commented Dec 1, 2024

I apologize for not being clear. I wasn't referring to any specific operation - this is something I noticed during normal usage, which is why I wanted to ask about it.

Since I'm not sure what information would be helpful, I've recorded a video showing my usage pattern. I'm using the gemma2:9b-instruct-q4_K_M model in the video, though this behavior occurs with all models regardless of which one I use.

https://www.youtube.com/watch?v=cT1qd-1YrJ4

When executing an action, sometimes it starts generating immediately, while other times there's about a 6-second delay. During these delays, I notice the dedicated GPU memory temporarily decreases before increasing again. This appears to be model reloading.

image

With the 9B model, it fits in the cache so there's no disk access, but when I use a 27B model in the same situation, disk access occurs, which led me to conclude it's reloading the model. (this is what I showed in the screenshot in my original post)

@pfrankov pfrankov removed the need info label Dec 2, 2024
@pfrankov pfrankov self-assigned this Dec 2, 2024
@pfrankov pfrankov added the bug Something isn't working label Dec 2, 2024
@7shi
Copy link
Author

7shi commented Dec 3, 2024

Thank you for fixing this issue! The unnecessary model reloading has been eliminated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants