Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Inference with NVIDIA GPU stop working after resuming from sleep on Linux #1459

Open
Tracked by #1568
barbicane opened this issue Mar 31, 2024 · 9 comments
Open
Tracked by #1568
Assignees
Labels
os: Linux type: bug Something isn't working

Comments

@barbicane
Copy link

Describe the bug
After resuming from sleep, inference with nvidia GPU doesn't work until restarting the system.

Steps to reproduce
Steps to reproduce the behavior:

  1. Setup GPU Acceleration.
  2. Do some chat.
  3. Suspend system.
  4. Resume system.
  5. Model (Trinity in my case) isn't loaded in NVRam anymore even if Jan is closed/reopened until the system is restarted.

Expected behavior
the model is loaded again in NVRam and GPU is used.

Environment details

  • Operating System: Linux - ZorinOS 17
  • Jan Version: 0.4.9
  • Processor: AMD Ryzen 7 7745HX with Radeon Graphics
  • RAM: 16GB
  • Graphic Card: Nvidia RTX 4070m
@barbicane barbicane added the type: bug Something isn't working label Mar 31, 2024
@hiro-v hiro-v changed the title bug: [DESCRIPTION] bug: Inference with NVIDIA GPU stop working after resuming from sleep on Linux Apr 1, 2024
@Van-QA
Copy link
Contributor

Van-QA commented Apr 2, 2024

It seems like there is an issue where if a user does not stop the model, the system goes into a "sleep" state. When the system is turned back on, the nitro process is killed.
However, the app still indicates that the model is starting. This leads to the error where the model is stuck in a starting state.

@imtuyethan
Copy link
Contributor

Hmm what is the decision for this case? @Van-QA

It seems like there is an issue where if a user does not stop the model, the system goes into a "sleep" state. When the system is turned back on, the nitro process is killed. However, the app still indicates that the model is starting. This leads to the error where the model is stuck in a starting state.

@hiento09
Copy link
Contributor

I think this bug needs to be handled in the code and cannot be resolved from the config in CI. What I understand here is that the app needs a mechanism to detect the state of the machine, save the app's state to local storage, and add logic to restore the app's state when the machine's state changes, such as transitioning from sleep to start. For example, to check the machine's state, you can use https://www.electronjs.org/docs/latest/api/power-monitor. cc @louis-jan

@hiento09
Copy link
Contributor

Added @namchuai and @vansangpfiev for state handling in cortex.cpp

@hiento09
Copy link
Contributor

cc @dan-homebrew @0xSage

@dan-menlo dan-menlo assigned louis-jan and unassigned hiento09 Sep 27, 2024
@dan-menlo
Copy link
Contributor

Ok - I am tagging this to @louis-jan @namchuai @vansangpfiev.

  • We should be able to handle system sleep state
  • Does this happen in Windows or Mac?

@freelerobot freelerobot transferred this issue from janhq/jan Oct 13, 2024
@namchuai
Copy link
Collaborator

Hmm, for cortex.cpp, I haven't experienced this issue. However, will do the retesting.

@freelerobot freelerobot moved this from Investigating to Planning in Menlo Oct 15, 2024
@freelerobot freelerobot moved this from Planning to Investigating in Menlo Oct 15, 2024
@freelerobot freelerobot moved this from Investigating to Planning in Menlo Oct 15, 2024
@dan-menlo
Copy link
Contributor

I am marking this for Sprint 25, as I would like to stabilize our Hardware and Engine APIs first.

@gabrielle-ong
Copy link
Contributor

May be linked to #1741

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
os: Linux type: bug Something isn't working
Projects
Status: Planning
Development

No branches or pull requests