Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference problem with baichuan 13b #9894

Open
K-Alex13 opened this issue Jan 12, 2024 · 22 comments
Open

inference problem with baichuan 13b #9894

K-Alex13 opened this issue Jan 12, 2024 · 22 comments
Assignees

Comments

@K-Alex13
Copy link

image
not sure what is going wrong here

@K-Alex13
Copy link
Author

update new question here , when I use interence with following gpu,how can I put inputs id to another gpu
image

@K-Alex13
Copy link
Author

The current machine being used is an a770, and the GPU memory should be sufficient. I hope you can provide me with some guidance.

@hkvision
Copy link
Contributor

Could you provide more details?

  • Are you running baichuan1 or baichuan2?
  • What sequence lengths in and out are you using that have this memory issue?

@hkvision
Copy link
Contributor

update new question here , when I use interence with following gpu,how can I put inputs id to another gpu image

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

@K-Alex13
Copy link
Author

Could you provide more details?

  • Are you running baichuan1 or baichuan2?
  • What sequence lengths in and out are you using that have this memory issue?

I am using baichuan2 and the sequence length should be the default.
model is doloading from following web.
https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main

@K-Alex13
Copy link
Author

update new question here , when I use interence with following gpu,how can I put inputs id to another gpu image

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

@hkvision
Copy link
Contributor

update new question here , when I use interence with following gpu,how can I put inputs id to another gpu image

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50
https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59
For example, you can modify here to use xpu:0/1 if you wish.

@hkvision
Copy link
Contributor

Could you provide more details?

  • Are you running baichuan1 or baichuan2?
  • What sequence lengths in and out are you using that have this memory issue?

I am using baichuan2 and the sequence length should be the default. model is doloading from following web. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main

Which script are you using and what is the default? Is it this one? https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one

@K-Alex13
Copy link
Author

Could you provide more details?

  • Are you running baichuan1 or baichuan2?
  • What sequence lengths in and out are you using that have this memory issue?

I am using baichuan2 and the sequence length should be the default. model is doloading from following web. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main

Which script are you using and what is the default? Is it this one? https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one

I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default

@jason-dai
Copy link
Contributor

I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default

Does model.chat use BigDL?

@K-Alex13
Copy link
Author

image
model = model.to('xpu:1') this is not working

@K-Alex13
Copy link
Author

I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default

Does model.chat use BigDL?
Yes I do use bigdl

@K-Alex13
Copy link
Author

update new question here , when I use interence with following gpu,how can I put inputs id to another gpu image

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50 https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59 For example, you can modify here to use xpu:0/1 if you wish.

I try to use xpu:0 and xpu:1 teo different situation. In xpu:1 there will come a problem that the device_id is out of range, and xpu:0 is the original state. What can I do next.

@hkvision
Copy link
Contributor

update new question here , when I use interence with following gpu,how can I put inputs id to another gpu image

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50 https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59 For example, you can modify here to use xpu:0/1 if you wish.

I try to use xpu:0 and xpu:1 teo different situation. In xpu:1 there will come a problem that the device_id is out of range, and xpu:0 is the original state. What can I do next.

So are there multiple gpu cards on your machine? After sourcing oneapi, you can use sycl-ls to check the gpu cards on your machine:
image

@K-Alex13
Copy link
Author

update new question here , when I use interence with following gpu,how can I put inputs id to another gpu image

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50 https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59 For example, you can modify here to use xpu:0/1 if you wish.

I try to use xpu:0 and xpu:1 teo different situation. In xpu:1 there will come a problem that the device_id is out of range, and xpu:0 is the original state. What can I do next.

So are there multiple gpu cards on your machine? After sourcing oneapi, you can use sycl-ls to check the gpu cards on your machine: image

image
This is the results of sycl-ls

@hkvision
Copy link
Contributor

Seems only one GPU is detected... Are other gpus properly set?

@K-Alex13
Copy link
Author

Not sure why there only one gpu detected, I see the gpu 2 in this figure?

@hkvision
Copy link
Contributor

image
You mean gpu:2 here? These two lines mean the same gpu, only one.

@K-Alex13
Copy link
Author

update new question here , when I use interence with following gpu,how can I put inputs id to another gpu image

Then what this figure means, it seems have 32G gpu

@qiuxin2012
Copy link
Contributor

It looks like your driver(released in 2023.7) is a little old. Please update your driver to latest version and try again.
You can download it from https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html

@WeiguangHan
Copy link
Contributor

WeiguangHan commented Jan 17, 2024

Could you provide more details?

  • Are you running baichuan1 or baichuan2?
  • What sequence lengths in and out are you using that have this memory issue?

I am using baichuan2 and the sequence length should be the default. model is doloading from following web. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main

Which script are you using and what is the default? Is it this one? https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one

I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default

Hi, I have tested it on my side using bigdl and model.chat from HF. And it worked fine. But I am a bit curious about the log output Thread in your screenshot which seemed strange to appear.

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
# import intel_extension_for_pytorch as ipex
from transformers.generation.utils import GenerationConfig
model = AutoModelForCausalLM.from_pretrained(r"D:\llm-models\Baichuan2-13B-Chat", optimize_model=True, load_in_low_bit="sym_int4",
                                                trust_remote_code=True, use_cache=True, cpu_embedding=False).eval()
tokenizer = AutoTokenizer.from_pretrained(r"D:\llm-models\Baichuan2-13B-Chat", trust_remote_code=True)
model.to("xpu")
model.generation_config = GenerationConfig.from_pretrained(r"D:\llm-models\Baichuan2-13B-Chat", revision="v2.0")
messages = []
messages.append({"role": "user", "content": "解释一下“温故而知新”"})
response = model.chat(tokenizer, messages)
print(response)

image

@hkvision hkvision removed their assignment Jan 17, 2024
@shane-huang
Copy link
Contributor

update new question here , when I use interence with following gpu,how can I put inputs id to another gpu image

Then what this figure means, it seems have 32G gpu

The GPU memory Arc770 can actually use is only 16G, as shown in your device screen snapshot.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants