You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LLM is shut down when context is full, instead of cleaning the context to 0 or to n_keep to keep it running.
When the context gets full, the LLM is shut down. It can be seem by checking the vram used by the system (it stops using any vram at all), and checking the "gd_llama.is_running()" and "gd_llama.is_waiting_input()", which both returning false.
Tried multiple combinations with the following parameters, and doesn't seem to work with any combination:
gd_llama.n_keep, gd_llama.instruct, gd_llama.interactive, gd_llama.context_size, gd_llama.n_predict.
Having the n_keep parameter suggests that a feature of having the llm keep running when the context gets full should already be implemented, so I suppose this is a bug.
Thank you very much.
The text was updated successfully, but these errors were encountered:
Hello :)
LLM is shut down when context is full, instead of cleaning the context to 0 or to n_keep to keep it running.
When the context gets full, the LLM is shut down. It can be seem by checking the vram used by the system (it stops using any vram at all), and checking the "gd_llama.is_running()" and "gd_llama.is_waiting_input()", which both returning false.
Tried multiple combinations with the following parameters, and doesn't seem to work with any combination:
gd_llama.n_keep, gd_llama.instruct, gd_llama.interactive, gd_llama.context_size, gd_llama.n_predict.
Having the n_keep parameter suggests that a feature of having the llm keep running when the context gets full should already be implemented, so I suppose this is a bug.
Thank you very much.
The text was updated successfully, but these errors were encountered: