Mamba2 Codestral generation example fails to load mismatching state dict #32561

SamPruden · 2024-08-09T10:42:23Z

System Info

Google Colab, transformers 4.42.4 (default Colab version) and 4.44.0 (after --upgrade)

transformers version: 4.42.4
Platform: Linux-6.1.85+-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.23.5
Safetensors version: 0.4.4
Accelerate version: 0.32.1
Accelerate config: not found
PyTorch version (GPU?): 2.3.1+cu121 (False)
Tensorflow version (GPU?): 2.17.0 (False)
Flax version (CPU?/GPU?/TPU?): 0.8.4 (cpu)
Jax version: 0.4.26
JaxLib version: 0.4.26
Using distributed or parallel set-up in script?:

Who can help?

@ArthurZucker @molbap

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Copied directly from the documentation.

from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch
model_id = 'mistralai/Mamba-Codestral-7B-v0.1'
tokenizer = AutoTokenizer.from_pretrained(model_id, revision='refs/pr/9', from_slow=True, legacy=False)
model = MambaForCausalLM.from_pretrained(model_id, revision='refs/pr/9')
input_ids = tokenizer("Hey how are you doing?", return_tensors= "pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

It first warns You are using a model of type mamba2 to instantiate a model of type mamba. This is not supported for all configurations of models and can yield errors.

It then errors with

RuntimeError: Error(s) in loading state_dict for MambaForCausalLM:
	size mismatch for backbone.layers.0.mixer.A_log: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([8192, 128]).
	size mismatch for backbone.layers.0.mixer.D: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([8192]).
	size mismatch for backbone.layers.0.mixer.conv1d.weight: copying a param with shape torch.Size([10240, 1, 4]) from checkpoint, the shape in current model is torch.Size([8192, 1, 4]).
	size mismatch for backbone.layers.0.mixer.conv1d.bias: copying a param with shape torch.Size([10240]) from checkpoint, the shape in current model is torch.Size([8192]).
	size mismatch for backbone.layers.0.mixer.in_proj.weight: copying a param with shape torch.Size([18560, 4096]) from checkpoint, the shape in current model is torch.Size([16384, 4096]).

etc. for all layers.

Expected behavior

It should work as documented.

The text was updated successfully, but these errors were encountered:

molbap · 2024-08-09T13:28:29Z

Thanks for the issue, taking a look!

molbap · 2024-08-09T13:29:39Z

You are using MambaForCausalLM - it should be Mamba2ForCausalLM there! Codestral is based on Mamba-2, not Mamba.

SamPruden · 2024-08-09T13:32:39Z

You are using MambaForCausalLM - it should be Mamba2ForCausalLM there! Codestral is based on Mamba-2, not Mamba.

It's copied directly from the docs on the site, which I suppose makes this a documentation error. I suspected it would be something this simple, but I was just doing a very quick test out of curiosity and didn't have time to dig into it immediately.

SamPruden · 2024-08-09T13:38:31Z

I thought I remembered checking that actually so I just took another look at my test notebook. It appears that Mamba2 isn't available on Colab without a pip install transformers --upgrade yet which confused my quick hacky checking.

molbap · 2024-08-09T14:13:26Z

No worries, you're right, I'll update the docs right away! For colab yes, it should be available soon though :)

SamPruden added the bug label Aug 9, 2024

SamPruden changed the title ~~Mamba2 generation example fails to load mismatching state dict~~ Mamba2 Codestral generation example fails to load mismatching state dict Aug 9, 2024

molbap mentioned this issue Aug 9, 2024

quickfix documentation #32566

Merged

ArthurZucker closed this as completed in #32566 Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mamba2 Codestral generation example fails to load mismatching state dict #32561

Mamba2 Codestral generation example fails to load mismatching state dict #32561

SamPruden commented Aug 9, 2024 •

edited

Loading

molbap commented Aug 9, 2024

molbap commented Aug 9, 2024

SamPruden commented Aug 9, 2024

SamPruden commented Aug 9, 2024

molbap commented Aug 9, 2024

Mamba2 Codestral generation example fails to load mismatching state dict #32561

Mamba2 Codestral generation example fails to load mismatching state dict #32561

Comments

SamPruden commented Aug 9, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

molbap commented Aug 9, 2024

molbap commented Aug 9, 2024

SamPruden commented Aug 9, 2024

SamPruden commented Aug 9, 2024

molbap commented Aug 9, 2024

SamPruden commented Aug 9, 2024 •

edited

Loading