[Bug]: MiniCPM3在做gptq量化时报错 #252

vokkko · 2024-10-11T06:05:59Z

Is there an existing issue ? / 是否已有相关的 issue ?

I have searched, and there is no existing issue. / 我已经搜索过了，没有相关的 issue。

Describe the bug / 描述这个 bug

安装完AutoGPTQ下MiniCPM分支auto_gptq依赖，更改权重地址后运行脚本报错：
(base) root@autodl-container-aaf946aa7c-bb3ebebc:~/autodl-tmp/MiniCPM/quantize# python gptq_quantize.py
/root/miniconda3/envs/LLMQ/lib/python3.10/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:411: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
def forward(ctx, input, qweight, scales, qzeros, g_idx, bits, maxq):
/root/miniconda3/envs/LLMQ/lib/python3.10/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:419: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(ctx, grad_output):
/root/miniconda3/envs/LLMQ/lib/python3.10/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:461: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@custom_fwd(cast_inputs=torch.float16)
Generating train split: 256 examples [00:00, 69732.55 examples/s]
Parameter 'function'=<function load_data..tokenize at 0x785813624ca0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
WARNING:datasets.fingerprint:Parameter 'function'=<function load_data..tokenize at 0x785813624ca0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 256/256 [00:00<00:00, 803.98 examples/s]
Traceback (most recent call last):
File "/root/autodl-tmp/MiniCPM/quantize/gptq_quantize.py", line 244, in
main()
File "/root/autodl-tmp/MiniCPM/quantize/gptq_quantize.py", line 196, in main
model.quantize(
File "/root/miniconda3/envs/LLMQ/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/LLMQ/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 209, in quantize
examples = self._prepare_examples_for_quantization(examples, batch_size)
File "/root/miniconda3/envs/LLMQ/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 165, in _prepare_examples_for_quantization
new_examples = [
File "/root/miniconda3/envs/LLMQ/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 166, in
collate_data(new_examples[start : start + batch_size], pad_token_id)
File "/root/miniconda3/envs/LLMQ/lib/python3.10/site-packages/auto_gptq/utils/data_utils.py", line 161, in collate_data
input_ids_blocks[i] = pad_block(input_ids_blocks[i], torch.ones((block_bsz, pad_num)) * pad_token_id)
TypeError: only integer tensors of a single element can be converted to an index
是校准集格式的问题吗？

To Reproduce / 如何复现

打印 new_examples的内容： {'input_ids': [[73441, 1345, 43686, 5, 9468, 1358, 2800, 16694, 1385, 3382, 5498, 1592, 1384, 2144, 1476, 6380, 1753, 5069, 59358, 1496, 4221, 1502, 4757, 1536, 1421, 1807, 2701, 2471, 59354, 73440, 1345, 5, 73441, 1345, 1836, 5, 4221, 59361, 1352, 4757, 1536, 1421, 1807, 2701, 2471, 72, 73440, 1345, 5]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'labels': [[73441, 1345, 43686, 5, 9468, 1358, 2800, 16694, 1385, 3382, 5498, 1592, 1384, 2144, 1476, 6380, 1753, 5069, 59358, 1496, 4221, 1502, 4757, 1536, 1421, 1807, 2701, 2471, 59354, 73440, 1345, 5, 73441, 1345, 1836, 5, 4221, 59361, 1352, 4757, 1536, 1421, 1807, 2701, 2471, 72, 73440, 1345, 5]]}

Expected behavior / 期望的结果

No response

Screenshots / 截图

No response

Environment / 环境

- OS: [e.g. Ubuntu 22.04]
- Pytorch: [e.g. torch 2.4.1]
- CUDA: [e.g. CUDA 12.1]
- Device: [e.g. RTX4090]

Additional context / 其他信息

No response

The text was updated successfully, but these errors were encountered:

LDLINGLINGLING · 2024-10-11T06:38:25Z

你好，可以使用我们自己量化好的模型哈，做过精度校准的

vokkko · 2024-10-11T07:08:35Z

你好，可以用我们自己确定好的模型哈，我们的精度安排的
想用4B模型做全精度微调后用GPTQ量化，但目前有一点问题

vokkko added bug Something isn't working triage labels Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: MiniCPM3在做gptq量化时报错 #252

[Bug]: MiniCPM3在做gptq量化时报错 #252

vokkko commented Oct 11, 2024

LDLINGLINGLING commented Oct 11, 2024

vokkko commented Oct 11, 2024

[Bug]: MiniCPM3在做gptq量化时报错 #252

[Bug]: MiniCPM3在做gptq量化时报错 #252

Comments

vokkko commented Oct 11, 2024

Is there an existing issue ? / 是否已有相关的 issue ?

Describe the bug / 描述这个 bug

To Reproduce / 如何复现

Expected behavior / 期望的结果

Screenshots / 截图

Environment / 环境

Additional context / 其他信息

LDLINGLINGLING commented Oct 11, 2024

vokkko commented Oct 11, 2024