Named Parameters changing for same quantized model if loaded from HuggingFace or AutoAWQ #380

imohitmayank · 2024-02-29T19:36:07Z

imohitmayank
Feb 29, 2024

While experimenting with LLaMa-2 model, I found that if I load base or AWQ model using HF or AutoAWQ code, the model load size and number of named parameters are different. Just wanted to understand why that is so. Below are the code and output.

Case 1: Base model for reference.

# load the model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", device_map="cuda:0", torch_dtype=torch.float16)
# get the named params details
print(f"Total number of names params: {len(list(model.named_parameters()))}")
print("They are - ")
for name, param in model.named_parameters():
    print(name, param.dtype)
""" Output:
Total number of names params: 291
They are - 
model.embed_tokens.weight torch.float16
model.layers.0.self_attn.q_proj.weight torch.float16
model.layers.0.self_attn.k_proj.weight torch.float16
model.layers.0.self_attn.v_proj.weight torch.float16
model.layers.0.self_attn.o_proj.weight torch.float16
model.layers.0.mlp.gate_proj.weight torch.float16
model.layers.0.mlp.up_proj.weight torch.float16
model.layers.0.mlp.down_proj.weight torch.float16
model.layers.0.input_layernorm.weight torch.float16
model.layers.0.post_attention_layernorm.weight torch.float16
model.layers.1.self_attn.q_proj.weight torch.float16
model.layers.1.self_attn.k_proj.weight torch.float16
model.layers.1.self_attn.v_proj.weight torch.float16
...
"""

Case 2: Loading AWQ model with HF

# load the model
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-Chat-AWQ", device_map="cuda:0", torch_dtype=torch.float16)
# get the named params details
print(f"Total number of names params: {len(list(model.named_parameters()))}")
print("They are - ")
for name, param in model.named_parameters():
    print(name, param.dtype)
""" Output:
Total number of names params: 67
They are - 
model.embed_tokens.weight torch.float16
model.layers.0.input_layernorm.weight torch.float16
model.layers.0.post_attention_layernorm.weight torch.float16
model.layers.1.input_layernorm.weight torch.float16
model.layers.1.post_attention_layernorm.weight torch.float16
model.layers.2.input_layernorm.weight torch.float16
model.layers.2.post_attention_layernorm.weight torch.float16
model.layers.3.input_layernorm.weight torch.float16...
"""

Case 3: Loading AWQ model with AutoAWQ

# load the model
model = AutoAWQForCausalLM.from_quantized("TheBloke/Llama-2-7B-Chat-AWQ", devide_map="cuda:0")
# get the named params details
print(f"Total number of names params: {len(list(model.named_parameters()))}")
print("They are - ")
for name, param in model.named_parameters():
    print(name, param.dtype)
""" Output:
Total number of names params: 99
They are - 
model.model.embedding.weight torch.float16
model.model.blocks.0.norm_1.weight torch.float16
model.model.blocks.0.attn.rope.freqs_cis torch.complex64
model.model.blocks.0.norm_2.weight torch.float16
model.model.blocks.1.norm_1.weight torch.float16
model.model.blocks.1.attn.rope.freqs_cis torch.complex64
model.model.blocks.1.norm_2.weight torch.float16
model.model.blocks.2.norm_1.weight torch.float16
model.model.blocks.2.attn.rope.freqs_cis torch.complex64
model.model.blocks.2.norm_2.weight torch.float16
model.model.blocks.3.norm_1.weight torch.float16
model.model.blocks.3.attn.rope.freqs_cis torch.complex64
model.model.blocks.3.norm_2.weight torch.float16...
"""

My questions are,

Does AWQ quantization also reduce (or fuse) the model layers apart from normal quantization process?
Why is there a difference in named parameters for the same model if we load via different packages? Also, which package is better (in terms of speed) and is there any correlation with the number of names params?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Named Parameters changing for same quantized model if loaded from HuggingFace or AutoAWQ #380

{{title}}

Replies: 0 comments

Select a reply

Named Parameters changing for same quantized model if loaded from HuggingFace or AutoAWQ #380

imohitmayank Feb 29, 2024

Replies: 0 comments

imohitmayank
Feb 29, 2024