inference problem with baichuan 13b #9894

K-Alex13 · 2024-01-12T03:48:25Z

not sure what is going wrong here

K-Alex13 · 2024-01-12T04:34:17Z

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

K-Alex13 · 2024-01-12T05:18:02Z

The current machine being used is an a770, and the GPU memory should be sufficient. I hope you can provide me with some guidance.

hkvision · 2024-01-15T02:30:01Z

Could you provide more details?

Are you running baichuan1 or baichuan2?
What sequence lengths in and out are you using that have this memory issue?

hkvision · 2024-01-15T02:40:23Z

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

K-Alex13 · 2024-01-15T03:05:09Z

Could you provide more details?

Are you running baichuan1 or baichuan2?

What sequence lengths in and out are you using that have this memory issue?

I am using baichuan2 and the sequence length should be the default.
model is doloading from following web.
https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main

K-Alex13 · 2024-01-15T03:06:13Z

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

hkvision · 2024-01-15T03:20:38Z

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50
https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59
For example, you can modify here to use xpu:0/1 if you wish.

hkvision · 2024-01-15T03:21:34Z

Could you provide more details?

Are you running baichuan1 or baichuan2?

What sequence lengths in and out are you using that have this memory issue?

I am using baichuan2 and the sequence length should be the default. model is doloading from following web. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main

Which script are you using and what is the default? Is it this one? https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one

K-Alex13 · 2024-01-15T03:26:22Z

Could you provide more details?

Are you running baichuan1 or baichuan2?

What sequence lengths in and out are you using that have this memory issue?

I am using baichuan2 and the sequence length should be the default. model is doloading from following web. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main

Which script are you using and what is the default? Is it this one? https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one

I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default

jason-dai · 2024-01-15T03:32:33Z

I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default

Does model.chat use BigDL?

K-Alex13 · 2024-01-15T03:39:09Z

model = model.to('xpu:1') this is not working

K-Alex13 · 2024-01-15T03:50:40Z

I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default

Does model.chat use BigDL?
Yes I do use bigdl

K-Alex13 · 2024-01-15T06:50:52Z

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50 https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59 For example, you can modify here to use xpu:0/1 if you wish.

I try to use xpu:0 and xpu:1 teo different situation. In xpu:1 there will come a problem that the device_id is out of range, and xpu:0 is the original state. What can I do next.

hkvision · 2024-01-15T07:23:01Z

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50 https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59 For example, you can modify here to use xpu:0/1 if you wish.

I try to use xpu:0 and xpu:1 teo different situation. In xpu:1 there will come a problem that the device_id is out of range, and xpu:0 is the original state. What can I do next.

So are there multiple gpu cards on your machine? After sourcing oneapi, you can use sycl-ls to check the gpu cards on your machine:

K-Alex13 · 2024-01-15T08:01:11Z

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50 https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59 For example, you can modify here to use xpu:0/1 if you wish.

I try to use xpu:0 and xpu:1 teo different situation. In xpu:1 there will come a problem that the device_id is out of range, and xpu:0 is the original state. What can I do next.

So are there multiple gpu cards on your machine? After sourcing oneapi, you can use sycl-ls to check the gpu cards on your machine:

This is the results of sycl-ls

hkvision · 2024-01-15T08:09:19Z

Seems only one GPU is detected... Are other gpus properly set?

K-Alex13 · 2024-01-15T08:10:54Z

Not sure why there only one gpu detected, I see the gpu 2 in this figure?

hkvision · 2024-01-15T08:15:51Z

You mean gpu:2 here? These two lines mean the same gpu, only one.

K-Alex13 · 2024-01-15T08:17:22Z

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

Then what this figure means, it seems have 32G gpu

qiuxin2012 · 2024-01-16T01:26:42Z

It looks like your driver(released in 2023.7) is a little old. Please update your driver to latest version and try again.
You can download it from https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html

WeiguangHan · 2024-01-17T08:46:21Z

Could you provide more details?

Are you running baichuan1 or baichuan2?

What sequence lengths in and out are you using that have this memory issue?

I am using baichuan2 and the sequence length should be the default. model is doloading from following web. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main

Which script are you using and what is the default? Is it this one? https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one

I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default

Hi, I have tested it on my side using bigdl and model.chat from HF. And it worked fine. But I am a bit curious about the log output Thread in your screenshot which seemed strange to appear.

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
# import intel_extension_for_pytorch as ipex
from transformers.generation.utils import GenerationConfig
model = AutoModelForCausalLM.from_pretrained(r"D:\llm-models\Baichuan2-13B-Chat", optimize_model=True, load_in_low_bit="sym_int4",
                                                trust_remote_code=True, use_cache=True, cpu_embedding=False).eval()
tokenizer = AutoTokenizer.from_pretrained(r"D:\llm-models\Baichuan2-13B-Chat", trust_remote_code=True)
model.to("xpu")
model.generation_config = GenerationConfig.from_pretrained(r"D:\llm-models\Baichuan2-13B-Chat", revision="v2.0")
messages = []
messages.append({"role": "user", "content": "解释一下“温故而知新”"})
response = model.chat(tokenizer, messages)
print(response)

shane-huang · 2024-01-22T11:07:54Z

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

Then what this figure means, it seems have 32G gpu

The GPU memory Arc770 can actually use is only 16G, as shown in your device screen snapshot.

glorysdj assigned hkvision Jan 15, 2024

hkvision added the user issue label Jan 15, 2024

hkvision assigned WeiguangHan Jan 15, 2024

hkvision removed their assignment Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference problem with baichuan 13b #9894

inference problem with baichuan 13b #9894

K-Alex13 commented Jan 12, 2024

K-Alex13 commented Jan 12, 2024

K-Alex13 commented Jan 12, 2024

hkvision commented Jan 15, 2024

hkvision commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

hkvision commented Jan 15, 2024

hkvision commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

jason-dai commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

hkvision commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

hkvision commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

hkvision commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

qiuxin2012 commented Jan 16, 2024

WeiguangHan commented Jan 17, 2024 •

edited

Loading

shane-huang commented Jan 22, 2024

inference problem with baichuan 13b #9894

inference problem with baichuan 13b #9894

Comments

K-Alex13 commented Jan 12, 2024

K-Alex13 commented Jan 12, 2024

K-Alex13 commented Jan 12, 2024

hkvision commented Jan 15, 2024

hkvision commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

hkvision commented Jan 15, 2024

hkvision commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

jason-dai commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

hkvision commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

hkvision commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

hkvision commented Jan 15, 2024

K-Alex13 commented Jan 15, 2024

qiuxin2012 commented Jan 16, 2024

WeiguangHan commented Jan 17, 2024 • edited Loading

shane-huang commented Jan 22, 2024

WeiguangHan commented Jan 17, 2024 •

edited

Loading