AutoModelForCausalLM model.generate Wrong response by docker run the same chatglm3-int4 model bin file #1680

ahlwjnj · 2024-07-28T14:03:15Z

from transformers import TextIteratorStreamer, AutoTokenizer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, RtnConfig

model_name = "./models/chatglm3-6b"

model = AutoModelForCausalLM.from_pretrained(model_name,
quantization_config=RtnConfig(bits=4, compute_dtype="int8",
weight_dtype="int4_fullrange",
use_neural_speed=True
),
trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

history = [{"role": "system", "content": "你的名字是'XX Chat'."}]
prompt = {"role": "user", "content": "Hi, please introduce yourself in Chinese."}
messages = history + [prompt]

##Stat to Chat
model_inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_tensors="pt"
)
output = model.generate(input_ids = model_inputs,
max_new_tokens = max_length
)
print("output=", output)
response = tokenizer.decode(output[0], skip_special_tokens=False)
print("origin response=\n", response)

The correct response can be got by running the above code on the development PC with Ubuntu, but wrong response got on the other PC by running docker image. And the wrong response is like the following style whatever Ubuntu or Windows environment.
2024-07-23 22:26:38 output=
2024-07-23 22:26:38 [[64790, 64792, 906, 31007, 13361, 31007, 30994, 13, 30910, 31822, 32873, 54532, 30953, 11214, 22011, 6263, 64795, 30910, 13, 8686, 30932, 2929, 9571, 3040, 291, 4463, 30930, 64796, 4033, 37204, 37204, 37204, 37204, 37204, 37204, 37204,
...
37204, 37204, 37204, 37204, 37204, 37204, 37204, 37204, 37204, 37204, 37204, 37204, 37204, 37204, 37204, 37204, 37204, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
2024-07-23 22:26:38 origin response=
2024-07-23 22:26:38 [gMASK] sop <|system|>
2024-07-23 22:26:38 你的名字是'XX Chat'. <|user|>
2024-07-23 22:26:38 Hi, please introduce yourself in Chinese. <|assistant|> Gold负面负面负面负面负面负面负面负面负面负面负面负面 ...

Q1: Quantization_config problem? I have change weight_dtype=“int4_fullrange”, "int4_clip", "int4", and same wrong response.
Q2: Is there any problem happens when copy the docker image with quantization model bin file from one PC to another ?
Q3: How to debug this problem when copy the docker image with quantization model bin file ?

Thanks for advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoModelForCausalLM model.generate Wrong response by docker run the same chatglm3-int4 model bin file #1680

AutoModelForCausalLM model.generate Wrong response by docker run the same chatglm3-int4 model bin file #1680

ahlwjnj commented Jul 28, 2024

AutoModelForCausalLM model.generate Wrong response by docker run the same chatglm3-int4 model bin file #1680

AutoModelForCausalLM model.generate Wrong response by docker run the same chatglm3-int4 model bin file #1680

Comments

ahlwjnj commented Jul 28, 2024