Bug when i quantize llama3.1 70b in multiple gpu(A40 *5) #663

Paxwell-Paxwell · 2024-12-02T08:23:15Z

model_init_kwargs = {"device_map": "auto", # Use "auto" for device mapping
"max_memory": {
i: "48GiB" for i in range(torch.cuda.device_count())
}
}
model = AutoAWQForCausalLM.from_pretrained(model_path ,
use_cache=False,
**model_init_kwargs)

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

Define quantization configuration

quantization_config = AwqConfig(
bits=quant_config["w_bit"],
group_size=quant_config["q_group_size"],
zero_point=quant_config["zero_point"],
version=quant_config["version"].lower(),
)

config = AutoConfig.from_pretrained(model_path)
config.quantization_config = quantization_config

model.quantize(tokenizer, quant_config=quant_config)

Save the quantized model

print("Saving quantized model...")
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:4! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)

casper-hansen · 2024-12-02T08:26:51Z

Hi @Paxwell-Paxwell, I noticed this is a bug that was recently introduced. I will try to figure out when this happened and then fix it.

casper-hansen mentioned this issue Dec 2, 2024

fix "Expected all tensors to be on the same device" #664

Merged

casper-hansen closed this as completed in #664 Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug when i quantize llama3.1 70b in multiple gpu(A40 *5) #663

Bug when i quantize llama3.1 70b in multiple gpu(A40 *5) #663

Paxwell-Paxwell commented Dec 2, 2024

casper-hansen commented Dec 2, 2024

Bug when i quantize llama3.1 70b in multiple gpu(A40 *5) #663

Bug when i quantize llama3.1 70b in multiple gpu(A40 *5) #663

Comments

Paxwell-Paxwell commented Dec 2, 2024

Define quantization configuration

Save the quantized model

casper-hansen commented Dec 2, 2024