RuntimeError: PyTorch is not linked with support for xpu devices #9768

openvino-book · 2023-12-23T04:12:18Z

RuntimeError: PyTorch is not linked with support for xpu devices

Install BigDL GPU version on Windows 11 as https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html

when execute the code as below, the model is chatglm3-6b

import torch
import time
import argparse
import numpy as np

from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer

# you could tune the prompt based on your own model,
# here the prompt tuning refers to https://github.com/THUDM/ChatGLM3/blob/main/PROMPT.md
CHATGLM_V3_PROMPT_FORMAT = "<|user|>\n{prompt}\n<|assistant|>"

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for ChatGLM3 model')
    parser.add_argument('--repo-id-or-model-path', type=str, default="d:/chatglm3-6b",
                        help='The huggingface repo id for the ChatGLM3 model to be downloaded'
                             ', or the path to the huggingface checkpoint folder')
    parser.add_argument('--prompt', type=str, default="AI是什么？",
                        help='Prompt to infer')
    parser.add_argument('--n-predict', type=int, default=32,
                        help='Max tokens to predict')

    args = parser.parse_args()
    model_path = args.repo_id_or_model_path

    # Load model in 4 bit,
    # which convert the relevant layers in the model into INT4 format
    model = AutoModel.from_pretrained(model_path,
                                      load_in_4bit=True,
                                      trust_remote_code=True)
    
    model.save_low_bit("bigdl_chatglm3-6b-q4_0.bin")
    #run the optimized model on Intel GPU
    model = model.to('xpu')

    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_path,
                                              trust_remote_code=True)
    
    # Generate predicted tokens
    with torch.inference_mode():
        prompt = CHATGLM_V3_PROMPT_FORMAT.format(prompt=args.prompt)
        input_ids = tokenizer.encode(prompt, return_tensors="pt")
        st = time.time()
        # if your selected model is capable of utilizing previous key/value attentions
        # to enhance decoding speed, but has `"use_cache": false` in its model config,
        # it is important to set `use_cache=True` explicitly in the `generate` function
        # to obtain optimal performance with BigDL-LLM INT4 optimizations
        output = model.generate(input_ids,
                                max_new_tokens=args.n_predict)
        end = time.time()
        output_str = tokenizer.decode(output[0], skip_special_tokens=True)
        print(f'Inference time: {end-st} s')
        print('-'*20, 'Prompt', '-'*20)
        print(prompt)
        print('-'*20, 'Output', '-'*20)
        print(output_str)

the Error will occur:

Does BigDL support to run ChatGLM3-6b on ARC GPU right now?

jason-dai · 2023-12-23T08:44:13Z

RuntimeError: PyTorch is not linked with support for xpu devices

It seems the installed PyTorch does not support XPU. Can you share the specific PyTorch version installed, and try if it works with Arc GPU (even without using BigDL)?

Does BigDL support to run ChatGLM3-6b on ARC GPU right now?

Yes, it supports ChatGLM3-6B on Arc GPU

MeouSker77 · 2023-12-25T01:36:21Z

add import intel_extension_for_pytorch as ipex before .to('xpu'), although we don't use ipex manually, it is still needed to run GPU

MeouSker77 · 2023-12-25T01:37:39Z

and also add input_ids = input_ids.to('xpu')

openvino-book · 2023-12-26T02:01:45Z

and also add input_ids = input_ids.to('xpu')

Thank you, @MeouSker77 , it works and solve the RuntimeError: PyTorch is not linked with support for xpu devices

The code is modified as below:

import time
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex

CHATGLM_V3_PROMPT_FORMAT = "<|user|>\n{prompt}\n<|assistant|>"

# 请指定chatglm3-6b的本地路径
model_path = "d:/chatglm3-6b"

# 载入ChatGLM3-6B模型并实现INT4量化
model = AutoModel.from_pretrained(model_path,
                                  load_in_4bit=True,
                                  trust_remote_code=True)
# run the optimized model on Intel GPU
model = model.to('xpu')

# 载入tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,
                                          trust_remote_code=True)
# 制作ChatGLM3格式提示词    
prompt = CHATGLM_V3_PROMPT_FORMAT.format(prompt="What is Intel?")

# 对提示词编码
input_ids = tokenizer.encode(prompt, return_tensors="pt")
input_ids = input_ids.to('xpu')
st = time.time()
# 执行推理计算，生成Tokens
output = model.generate(input_ids,max_new_tokens=32)
end = time.time()
# 对生成Tokens解码并显示
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(f'Inference time: {end-st} s')
print('-'*20, 'Prompt', '-'*20)
print(prompt)
print('-'*20, 'Output', '-'*20)
print(output_str)

However, another runtime error happen:
RuntimeError: The number of work-items in each dimension of a work-group cannot exceed {512, 512, 512} for this device -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)

(llm_gpu) D:>python chatglm3_infer_gpu.py
C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\Users\OV\anaconda3\envs\llm_gpu\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 7/7 [00:04<00:00, 1.57it/s]
2023-12-26 09:55:56,907 - INFO - Converting the current model to sym_int4 format......
Traceback (most recent call last):
File "D:\chatglm3_infer_gpu.py", line 29, in
output = model.generate(input_ids,max_new_tokens=32)
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\transformers\generation\utils.py", line 1538, in generate
return self.greedy_search(
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\transformers\generation\utils.py", line 2362, in greedy_search
outputs = self(
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\OV/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 937, in forward
transformer_outputs = self.transformer(
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 152, in chatglm2_model_forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\OV/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 640, in forward
layer_ret = layer(
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\OV/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 542, in forward
layernorm_output = self.input_layernorm(hidden_states)
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\llm_gpu\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 83, in chatglm_rms_norm_forward
result = linear_q4_0.fused_rms_norm(hidden_states,
RuntimeError: The number of work-items in each dimension of a work-group cannot exceed {512, 512, 512} for this device -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)

Could you tell me how to solve it to make ChatGLM3-6B run on the A770? Thank you very much in advance!

MeouSker77 · 2023-12-26T02:48:32Z

Can you try this code to get the device name of 'xpu:0'?

name = torch.xpu.get_device_name(0)
print(name)

I'm afraid the default xpu device is not A770

openvino-book · 2023-12-26T14:31:08Z

Can you try this code to get the device name of 'xpu:0'?
name = torch.xpu.get_device_name(0)
print(name)
I'm afraid the default xpu device is not A770

When I run the code, I got the AttributeError:
** module 'torch' has no attribute 'xpu'**

jason-dai · 2023-12-26T14:49:20Z

Add import intel_extension_for_pytorch as ipex?

openvino-book · 2023-12-27T02:23:46Z

print(name)

openvino-book · 2023-12-27T02:26:37Z

how to set the device as Iris Xe or Arc A770?

MeouSker77 · 2023-12-27T02:28:40Z

how to set the device as Iris Xe or Arc A770?

change all to('xpu') to to('xpu:1') to use A770

openvino-book · 2023-12-27T02:33:38Z

xpu:1

import time
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex

CHATGLM_V3_PROMPT_FORMAT = "<|user|>\n{prompt}\n<|assistant|>"

# 请指定chatglm3-6b的本地路径
model_path = "d:/chatglm3-6b"

# 载入ChatGLM3-6B模型并实现INT4量化
model = AutoModel.from_pretrained(model_path,
                                  load_in_4bit=True,
                                  trust_remote_code=True)
# run the optimized model on Intel GPU
model = model.to('xpu:1')

# 载入tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,
                                          trust_remote_code=True)
# 制作ChatGLM3格式提示词    
prompt = CHATGLM_V3_PROMPT_FORMAT.format(prompt="What is Intel?")

# 对提示词编码
input_ids = tokenizer.encode(prompt, return_tensors="pt")
input_ids = input_ids.to('xpu:1')
st = time.time()
# 执行推理计算，生成Tokens
output = model.generate(input_ids,max_new_tokens=32)
end = time.time()
# 对生成Tokens解码并显示
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(f'Inference time: {end-st} s')
print('-'*20, 'Prompt', '-'*20)
print(prompt)
print('-'*20, 'Output', '-'*20)
print(output_str)

RuntimeError: could not create a primitive

(bigdl) D:>python chatglm3_infer_gpu.py
C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\Users\OV\anaconda3\envs\bigdl\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
Loading checkpoint shards: 100%|███████████████████████████████| 7/7 [00:04<00:00, 1.53it/s]
2023-12-27 10:32:52,956 - INFO - Converting the current model to sym_int4 format......
onednn_verbose,info,oneDNN v3.3.0 (commit 887fb044ccd6308ed1780a3863c2c6f5772c94b3)
onednn_verbose,info,cpu,runtime:threadpool,nthr:10
onednn_verbose,info,cpu,isa:Intel AVX2 with Intel DL Boost
onednn_verbose,info,gpu,runtime:DPC++
onednn_verbose,info,gpu,engine,0,backend:Level Zero,name:Intel(R) Iris(R) Xe Graphics,driver_version:1.3.26957,binary_kernels:enabled
onednn_verbose,info,gpu,engine,1,backend:Level Zero,name:Intel(R) Arc(TM) A770M Graphics,driver_version:1.3.26957,binary_kernels:enabled
onednn_verbose,info,graph,backend,0:dnnl_backend
onednn_verbose,info,experimental features are enabled
onednn_verbose,info,use batch_normalization stats one pass is enabled
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,backend,exec_time
onednn_verbose,common,error,level_zero,errcode 1879048196
Traceback (most recent call last):
File "D:\chatglm3_infer_gpu.py", line 29, in
output = model.generate(input_ids,max_new_tokens=32)
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\transformers\generation\utils.py", line 1538, in generate
return self.greedy_search(
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\transformers\generation\utils.py", line 2362, in greedy_search
outputs = self(
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\OV/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 937, in forward
transformer_outputs = self.transformer(
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 152, in chatglm2_model_forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\OV/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 640, in forward
layer_ret = layer(
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\OV/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 544, in forward
attention_output, kv_cache = self.self_attention(
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 353, in chatglm2_attention_forward_8eb45c
context_layer = self.core_attention(query_layer, key_layer, value_layer, attention_mask)
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\OV\anaconda3\envs\bigdl\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 369, in core_attn_forward_8eb45c
context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer,
RuntimeError: could not create a primitive

MeouSker77 · 2023-12-27T07:00:04Z

Sorry, on our Windows A770 machines, A770 are all the default xpu device, so we cannot reproduce this error.

You can change 'xpu:1' back to 'xpu' and add optimize_mode=False in from_pretrained to run it on iGPU

Or you can change 'xpu:1' back to 'xpu', and try set ONEAPI_DEVICE_SELECTOR=level_zero:1 before running this example, set ONEAPI_DEVICE_SELECTOR=level_zero:1 should set A770 to default device.

JinBridger · 2023-12-27T08:45:54Z

Sorry, on our Windows A770 machines, A770 are all the default xpu device, so we cannot reproduce this error.

You can change 'xpu:1' back to 'xpu' and add optimize_mode=False in from_pretrained to run it on iGPU

Or you can change 'xpu:1' back to 'xpu', and try set ONEAPI_DEVICE_SELECTOR=level_zero:1 before running this example, set ONEAPI_DEVICE_SELECTOR=level_zero:1 should set A770 to default device.

Maybe we should test on laptop because A770M is GPU for Laptop. I'll try if I could reproduce this error on a Laptop.

openvino-book · 2023-12-28T01:38:37Z

Sorry, on our Windows A770 machines, A770 are all the default xpu device, so we cannot reproduce this error.
You can change 'xpu:1' back to 'xpu' and add optimize_mode=False in from_pretrained to run it on iGPU
Or you can change 'xpu:1' back to 'xpu', and try set ONEAPI_DEVICE_SELECTOR=level_zero:1 before running this example, set ONEAPI_DEVICE_SELECTOR=level_zero:1 should set A770 to default device.

Maybe we should test on laptop because A770M is GPU for Laptop. I'll try if I could reproduce this error on a Laptop.

My machine is the NUC12 蝰蛇峡谷（Serpent Canyon） i7 12700H+Arc A770M

I change 'xpu:1' back to 'xpu' and set ONEAPI_DEVICE_SELECTOR=level_zero:1 -- It works!!! Thank you very much!!!

run the code

import time
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex
import torch

CHATGLM_V3_PROMPT_FORMAT = "<|user|>\n{prompt}\n<|assistant|>"

# 请指定chatglm3-6b的本地路径
model_path = "d:/chatglm3-6b"

# 载入ChatGLM3-6B模型并实现INT4量化
model = AutoModel.from_pretrained(model_path,
                                  load_in_4bit=True,
                                  trust_remote_code=True)
# run the optimized model on Intel GPU
model = model.to('xpu')

# 载入tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,
                                          trust_remote_code=True)
# 制作ChatGLM3格式提示词    
prompt = CHATGLM_V3_PROMPT_FORMAT.format(prompt="What is Intel?")

# 对提示词编码
input_ids = tokenizer.encode(prompt, return_tensors="pt")
input_ids = input_ids.to('xpu')
st = time.time()
# 执行推理计算，生成Tokens
output = model.generate(input_ids,max_new_tokens=32)
end = time.time()
# 对生成Tokens解码并显示
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(f'Inference time: {end-st} s')
print('-'*20, 'Prompt', '-'*20)
print(prompt)
print('-'*20, 'Output', '-'*20)
print(output_str)

openvino-book · 2023-12-28T01:48:19Z

@JinBridger Could I ask one more question? I want to run chatglm3-6b on A770 by streamlit
model = model.to("xpu") can be added in the get_model(),
How do I add the input_ids = input_ids.to('xpu') ?

The complete code is attached below:

import streamlit as st
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex
import torch

# 设置页面标题、图标和布局
st.set_page_config(
    page_title="ChatGLM3-6B+BigDL-LLM演示",
    page_icon=":robot:",
    layout="wide"
)
# 请指定chatglm3-6b的本地路径
model_path = "d:/chatglm3-6b"

@st.cache_resource
def get_model():
    # 载入ChatGLM3-6B模型并实现INT4量化
    model = AutoModel.from_pretrained(model_path,
                                    load_in_4bit=True,
                                    trust_remote_code=True)
    model = model.to('xpu')
    # 载入tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_path,
                                            trust_remote_code=True)
    return tokenizer, model

# 加载Chatglm3的model和tokenizer
tokenizer, model = get_model()

# 初始化历史记录和past key values
if "history" not in st.session_state:
    st.session_state.history = []
if "past_key_values" not in st.session_state:
    st.session_state.past_key_values = None

# 设置max_length、top_p和temperature
max_length = st.sidebar.slider("max_length", 0, 32768, 8192, step=1)
top_p = st.sidebar.slider("top_p", 0.0, 1.0, 0.8, step=0.01)
temperature = st.sidebar.slider("temperature", 0.0, 1.0, 0.6, step=0.01)

# 清理会话历史
buttonClean = st.sidebar.button("清理会话历史", key="clean")
if buttonClean:
    st.session_state.history = []
    st.session_state.past_key_values = None
    st.rerun()

# 渲染聊天历史记录
for i, message in enumerate(st.session_state.history):
    if message["role"] == "user":
        with st.chat_message(name="user", avatar="user"):
            st.markdown(message["content"])
    else:
        with st.chat_message(name="assistant", avatar="assistant"):
            st.markdown(message["content"])

# 输入框和输出框
with st.chat_message(name="user", avatar="user"):
    input_placeholder = st.empty()
with st.chat_message(name="assistant", avatar="assistant"):
    message_placeholder = st.empty()

# 获取用户输入
prompt_text = st.chat_input("请输入您的问题")

# 如果用户输入了内容,则生成回复
if prompt_text:

    input_placeholder.markdown(prompt_text)
    history = st.session_state.history
    past_key_values = st.session_state.past_key_values
    for response, history, past_key_values in model.stream_chat(
        tokenizer,
        prompt_text,
        history,
        past_key_values=past_key_values,
        max_length=max_length,
        top_p=top_p,
        temperature=temperature,
        return_past_key_values=True,
    ):
        message_placeholder.markdown(response)

    # 更新历史记录和past key values
    st.session_state.history = history
    st.session_state.past_key_values = past_key_values

MeouSker77 · 2023-12-28T01:56:31Z

don't worry, the stream_chat API will move input tokens to model's device automatically (here), so you just need to move model to xpu

openvino-book · 2023-12-29T01:48:44Z

don't worry, the stream_chat API will move input tokens to model's device automatically (here), so you just need to move model to xpu

Yes!! Thank you very much for guidance! It works!!!

Tested sample code

import streamlit as st
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex
import torch

# 设置页面标题、图标和布局
st.set_page_config(
    page_title="ChatGLM3-6B+BigDL-LLM演示",
    page_icon=":robot:",
    layout="wide"
)
# 请指定chatglm3-6b的本地路径
model_path = "d:/chatglm3-6b"

@st.cache_resource
def get_model():
    # 载入ChatGLM3-6B模型并实现INT4量化
    model = AutoModel.from_pretrained(model_path,
                                    load_in_4bit=True,
                                    trust_remote_code=True)
    model = model.to('xpu')
    # 载入tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_path,
                                            trust_remote_code=True)
    return tokenizer, model

# 加载Chatglm3的model和tokenizer
tokenizer, model = get_model()

# 初始化历史记录和past key values
if "history" not in st.session_state:
    st.session_state.history = []
if "past_key_values" not in st.session_state:
    st.session_state.past_key_values = None

# 设置max_length、top_p和temperature
max_length = st.sidebar.slider("max_length", 0, 32768, 8192, step=1)
top_p = st.sidebar.slider("top_p", 0.0, 1.0, 0.8, step=0.01)
temperature = st.sidebar.slider("temperature", 0.0, 1.0, 0.6, step=0.01)

# 清理会话历史
buttonClean = st.sidebar.button("清理会话历史", key="clean")
if buttonClean:
    st.session_state.history = []
    st.session_state.past_key_values = None
    st.rerun()

# 渲染聊天历史记录
for i, message in enumerate(st.session_state.history):
    if message["role"] == "user":
        with st.chat_message(name="user", avatar="user"):
            st.markdown(message["content"])
    else:
        with st.chat_message(name="assistant", avatar="assistant"):
            st.markdown(message["content"])

# 输入框和输出框
with st.chat_message(name="user", avatar="user"):
    input_placeholder = st.empty()
with st.chat_message(name="assistant", avatar="assistant"):
    message_placeholder = st.empty()

# 获取用户输入
prompt_text = st.chat_input("请输入您的问题")

# 如果用户输入了内容,则生成回复
if prompt_text:

    input_placeholder.markdown(prompt_text)
    history = st.session_state.history
    past_key_values = st.session_state.past_key_values
    for response, history, past_key_values in model.stream_chat(
        tokenizer,
        prompt_text,
        history,
        past_key_values=past_key_values,
        max_length=max_length,
        top_p=top_p,
        temperature=temperature,
        return_past_key_values=True,
    ):
        message_placeholder.markdown(response)

    # 更新历史记录和past key values
    st.session_state.history = history
    st.session_state.past_key_values = past_key_values

openvino-book · 2024-01-11T23:05:22Z

三步完成在英特尔独立显卡上量化和部署 ChatGLM3-6B 模型

jason-dai added the user issue label Dec 23, 2023

rnwang04 mentioned this issue Jan 3, 2024

LLM: fix chatglm3 issue #9820

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: PyTorch is not linked with support for xpu devices #9768

RuntimeError: PyTorch is not linked with support for xpu devices #9768

openvino-book commented Dec 23, 2023

jason-dai commented Dec 23, 2023

MeouSker77 commented Dec 25, 2023 •

edited

Loading

MeouSker77 commented Dec 25, 2023

openvino-book commented Dec 26, 2023

MeouSker77 commented Dec 26, 2023

openvino-book commented Dec 26, 2023

jason-dai commented Dec 26, 2023

openvino-book commented Dec 27, 2023

openvino-book commented Dec 27, 2023

MeouSker77 commented Dec 27, 2023

openvino-book commented Dec 27, 2023

MeouSker77 commented Dec 27, 2023

JinBridger commented Dec 27, 2023

openvino-book commented Dec 28, 2023

openvino-book commented Dec 28, 2023

MeouSker77 commented Dec 28, 2023

openvino-book commented Dec 29, 2023

openvino-book commented Jan 11, 2024 •

edited

Loading

RuntimeError: PyTorch is not linked with support for xpu devices #9768

RuntimeError: PyTorch is not linked with support for xpu devices #9768

Comments

openvino-book commented Dec 23, 2023

jason-dai commented Dec 23, 2023

MeouSker77 commented Dec 25, 2023 • edited Loading

MeouSker77 commented Dec 25, 2023

openvino-book commented Dec 26, 2023

MeouSker77 commented Dec 26, 2023

openvino-book commented Dec 26, 2023

jason-dai commented Dec 26, 2023

openvino-book commented Dec 27, 2023

openvino-book commented Dec 27, 2023

MeouSker77 commented Dec 27, 2023

openvino-book commented Dec 27, 2023

MeouSker77 commented Dec 27, 2023

JinBridger commented Dec 27, 2023

openvino-book commented Dec 28, 2023

openvino-book commented Dec 28, 2023

MeouSker77 commented Dec 28, 2023

openvino-book commented Dec 29, 2023

openvino-book commented Jan 11, 2024 • edited Loading

MeouSker77 commented Dec 25, 2023 •

edited

Loading

openvino-book commented Jan 11, 2024 •

edited

Loading