-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native API returns: -2 (PI_ERROR_DEVICE_NOT_AVAILABLE) #11400
Comments
Minimum code:
Yeah I definitively have to improve coding skills but atleast these works, for now |
Hi, |
Hi, model = AutoModel.from_pretrained(Res,load_in_4bit=True, cpu_embedding=True, trust_remote_code=True,optimize_model=False) |
Hi @lzivan, |
Still crashed after a longer wait, left the program idle
|
We have run your minimum example successfully yesterday. (delete the
Another method you can try to save more memory is using fp16 model, change |
Hi @qiuxin2012, |
@TriDefender Got it, we will try to reproduce your error. |
Hello @qiuxin2012! I am running IPEXLLM-built ollama on Windows 11. |
Hi @bibekyess , |
memory usage will increase after a few inferences because of all the token cached, try clearing the history and try again, the same situation happened here too, perhaps waiting a few minutes before your inference? It might be the same issue that we are both facing. |
Iris can use only half of your 16GB RAM, about 7.8 GB. If you use more then 7.8GB memory, you will get |
@TriDefender Thank you for your response. Yeah, looks like we are in the same boat. 😄
Actually, my waiting time or time-to-first-token is not that long (in minutes as you said), its somewhere between 10 to 20 seconds. |
@qiuxin2012 Thank you for your response. It makes sense. I thought that Iris has a separate 8GB memory. So I was thinking that I had 16+8 GB memory in total. But based on your explanation, Iris uses half of the RAM memory so basically everything is loaded on that 16GB RAM only. Is my understanding correct? |
Yes, you are right. The igpu memory is also shared with other programs. You can see from my task manager page, my laptop is connected to a 4k screen. The memory usable is only 5.4GB. |
@TriDefender We have reproduced your error, after I left the program idle for 5 minutes. I thought it was caused by the garbage collections.
|
Hi,@qiuxin2012 ,
This is the part where the warmup would normaly produce:
Then I waited five minutes before continuing, and i got the same error
It seems that after the first inference the program will only stay stable after a short moment(around 5 minutes), then no matter what you do the |
This shouldn't be a OOM issue, because I do have enough physical ran and swapfile. |
Hi @qiuxin2012! |
Can you open a new issue for this? I will find another colleague to follow your issue. |
Yes, it won't be a OOM issue. I think is caused by garbage collection. But I don't know why.
|
Is there anyway to resolve this issue? |
I have no idea now, I will inform you if I find the solution. |
i have the same problem,Is there any way to resolve this issue now? |
Not resolved. |
Same issue for me on Iris Xe GPU. 32GB RAM. qwen2.5-3B and 1.5B. |
Traceback is as followed, I was running ChatGLM4-9b-chat on my laptop.
Device configurations
OS: Win 11 23H2 (22631.3737)
The traceback is:
This seems to happen when I loaded the model and left it there idle after sometime,
The text was updated successfully, but these errors were encountered: