-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: PyTorch is not linked with support for xpu devices #9768
Comments
It seems the installed PyTorch does not support XPU. Can you share the specific PyTorch version installed, and try if it works with Arc GPU (even without using BigDL)?
Yes, it supports ChatGLM3-6B on Arc GPU |
add |
and also add |
Thank you, @MeouSker77 , it works and solve the RuntimeError: PyTorch is not linked with support for xpu devices The code is modified as below: import time
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex
CHATGLM_V3_PROMPT_FORMAT = "<|user|>\n{prompt}\n<|assistant|>"
# 请指定chatglm3-6b的本地路径
model_path = "d:/chatglm3-6b"
# 载入ChatGLM3-6B模型并实现INT4量化
model = AutoModel.from_pretrained(model_path,
load_in_4bit=True,
trust_remote_code=True)
# run the optimized model on Intel GPU
model = model.to('xpu')
# 载入tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)
# 制作ChatGLM3格式提示词
prompt = CHATGLM_V3_PROMPT_FORMAT.format(prompt="What is Intel?")
# 对提示词编码
input_ids = tokenizer.encode(prompt, return_tensors="pt")
input_ids = input_ids.to('xpu')
st = time.time()
# 执行推理计算,生成Tokens
output = model.generate(input_ids,max_new_tokens=32)
end = time.time()
# 对生成Tokens解码并显示
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(f'Inference time: {end-st} s')
print('-'*20, 'Prompt', '-'*20)
print(prompt)
print('-'*20, 'Output', '-'*20)
print(output_str) However, another runtime error happen:
Could you tell me how to solve it to make ChatGLM3-6B run on the A770? Thank you very much in advance! |
Can you try this code to get the device name of 'xpu:0'?
I'm afraid the default xpu device is not A770 |
Add |
how to set the device as Iris Xe or Arc A770? |
change all |
import time
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex
CHATGLM_V3_PROMPT_FORMAT = "<|user|>\n{prompt}\n<|assistant|>"
# 请指定chatglm3-6b的本地路径
model_path = "d:/chatglm3-6b"
# 载入ChatGLM3-6B模型并实现INT4量化
model = AutoModel.from_pretrained(model_path,
load_in_4bit=True,
trust_remote_code=True)
# run the optimized model on Intel GPU
model = model.to('xpu:1')
# 载入tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)
# 制作ChatGLM3格式提示词
prompt = CHATGLM_V3_PROMPT_FORMAT.format(prompt="What is Intel?")
# 对提示词编码
input_ids = tokenizer.encode(prompt, return_tensors="pt")
input_ids = input_ids.to('xpu:1')
st = time.time()
# 执行推理计算,生成Tokens
output = model.generate(input_ids,max_new_tokens=32)
end = time.time()
# 对生成Tokens解码并显示
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(f'Inference time: {end-st} s')
print('-'*20, 'Prompt', '-'*20)
print(prompt)
print('-'*20, 'Output', '-'*20)
print(output_str) RuntimeError: could not create a primitive
|
Sorry, on our Windows A770 machines, A770 are all the default xpu device, so we cannot reproduce this error. You can change Or you can change |
Maybe we should test on laptop because A770M is GPU for Laptop. I'll try if I could reproduce this error on a Laptop. |
My machine is the NUC12 蝰蛇峡谷(Serpent Canyon) i7 12700H+Arc A770M I change 'xpu:1' back to 'xpu' and set ONEAPI_DEVICE_SELECTOR=level_zero:1 -- It works!!! Thank you very much!!! run the code import time
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex
import torch
CHATGLM_V3_PROMPT_FORMAT = "<|user|>\n{prompt}\n<|assistant|>"
# 请指定chatglm3-6b的本地路径
model_path = "d:/chatglm3-6b"
# 载入ChatGLM3-6B模型并实现INT4量化
model = AutoModel.from_pretrained(model_path,
load_in_4bit=True,
trust_remote_code=True)
# run the optimized model on Intel GPU
model = model.to('xpu')
# 载入tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)
# 制作ChatGLM3格式提示词
prompt = CHATGLM_V3_PROMPT_FORMAT.format(prompt="What is Intel?")
# 对提示词编码
input_ids = tokenizer.encode(prompt, return_tensors="pt")
input_ids = input_ids.to('xpu')
st = time.time()
# 执行推理计算,生成Tokens
output = model.generate(input_ids,max_new_tokens=32)
end = time.time()
# 对生成Tokens解码并显示
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(f'Inference time: {end-st} s')
print('-'*20, 'Prompt', '-'*20)
print(prompt)
print('-'*20, 'Output', '-'*20)
print(output_str) |
@JinBridger Could I ask one more question? I want to run chatglm3-6b on A770 by streamlit The complete code is attached below: import streamlit as st
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex
import torch
# 设置页面标题、图标和布局
st.set_page_config(
page_title="ChatGLM3-6B+BigDL-LLM演示",
page_icon=":robot:",
layout="wide"
)
# 请指定chatglm3-6b的本地路径
model_path = "d:/chatglm3-6b"
@st.cache_resource
def get_model():
# 载入ChatGLM3-6B模型并实现INT4量化
model = AutoModel.from_pretrained(model_path,
load_in_4bit=True,
trust_remote_code=True)
model = model.to('xpu')
# 载入tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)
return tokenizer, model
# 加载Chatglm3的model和tokenizer
tokenizer, model = get_model()
# 初始化历史记录和past key values
if "history" not in st.session_state:
st.session_state.history = []
if "past_key_values" not in st.session_state:
st.session_state.past_key_values = None
# 设置max_length、top_p和temperature
max_length = st.sidebar.slider("max_length", 0, 32768, 8192, step=1)
top_p = st.sidebar.slider("top_p", 0.0, 1.0, 0.8, step=0.01)
temperature = st.sidebar.slider("temperature", 0.0, 1.0, 0.6, step=0.01)
# 清理会话历史
buttonClean = st.sidebar.button("清理会话历史", key="clean")
if buttonClean:
st.session_state.history = []
st.session_state.past_key_values = None
st.rerun()
# 渲染聊天历史记录
for i, message in enumerate(st.session_state.history):
if message["role"] == "user":
with st.chat_message(name="user", avatar="user"):
st.markdown(message["content"])
else:
with st.chat_message(name="assistant", avatar="assistant"):
st.markdown(message["content"])
# 输入框和输出框
with st.chat_message(name="user", avatar="user"):
input_placeholder = st.empty()
with st.chat_message(name="assistant", avatar="assistant"):
message_placeholder = st.empty()
# 获取用户输入
prompt_text = st.chat_input("请输入您的问题")
# 如果用户输入了内容,则生成回复
if prompt_text:
input_placeholder.markdown(prompt_text)
history = st.session_state.history
past_key_values = st.session_state.past_key_values
for response, history, past_key_values in model.stream_chat(
tokenizer,
prompt_text,
history,
past_key_values=past_key_values,
max_length=max_length,
top_p=top_p,
temperature=temperature,
return_past_key_values=True,
):
message_placeholder.markdown(response)
# 更新历史记录和past key values
st.session_state.history = history
st.session_state.past_key_values = past_key_values |
don't worry, the |
Yes!! Thank you very much for guidance! It works!!! Tested sample code import streamlit as st
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex
import torch
# 设置页面标题、图标和布局
st.set_page_config(
page_title="ChatGLM3-6B+BigDL-LLM演示",
page_icon=":robot:",
layout="wide"
)
# 请指定chatglm3-6b的本地路径
model_path = "d:/chatglm3-6b"
@st.cache_resource
def get_model():
# 载入ChatGLM3-6B模型并实现INT4量化
model = AutoModel.from_pretrained(model_path,
load_in_4bit=True,
trust_remote_code=True)
model = model.to('xpu')
# 载入tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)
return tokenizer, model
# 加载Chatglm3的model和tokenizer
tokenizer, model = get_model()
# 初始化历史记录和past key values
if "history" not in st.session_state:
st.session_state.history = []
if "past_key_values" not in st.session_state:
st.session_state.past_key_values = None
# 设置max_length、top_p和temperature
max_length = st.sidebar.slider("max_length", 0, 32768, 8192, step=1)
top_p = st.sidebar.slider("top_p", 0.0, 1.0, 0.8, step=0.01)
temperature = st.sidebar.slider("temperature", 0.0, 1.0, 0.6, step=0.01)
# 清理会话历史
buttonClean = st.sidebar.button("清理会话历史", key="clean")
if buttonClean:
st.session_state.history = []
st.session_state.past_key_values = None
st.rerun()
# 渲染聊天历史记录
for i, message in enumerate(st.session_state.history):
if message["role"] == "user":
with st.chat_message(name="user", avatar="user"):
st.markdown(message["content"])
else:
with st.chat_message(name="assistant", avatar="assistant"):
st.markdown(message["content"])
# 输入框和输出框
with st.chat_message(name="user", avatar="user"):
input_placeholder = st.empty()
with st.chat_message(name="assistant", avatar="assistant"):
message_placeholder = st.empty()
# 获取用户输入
prompt_text = st.chat_input("请输入您的问题")
# 如果用户输入了内容,则生成回复
if prompt_text:
input_placeholder.markdown(prompt_text)
history = st.session_state.history
past_key_values = st.session_state.past_key_values
for response, history, past_key_values in model.stream_chat(
tokenizer,
prompt_text,
history,
past_key_values=past_key_values,
max_length=max_length,
top_p=top_p,
temperature=temperature,
return_past_key_values=True,
):
message_placeholder.markdown(response)
# 更新历史记录和past key values
st.session_state.history = history
st.session_state.past_key_values = past_key_values |
RuntimeError: PyTorch is not linked with support for xpu devices
Install BigDL GPU version on Windows 11 as https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html

when execute the code as below, the model is chatglm3-6b
the Error will occur:

Does BigDL support to run ChatGLM3-6b on ARC GPU right now?
The text was updated successfully, but these errors were encountered: