Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mamba2 Codestral generation example fails to load mismatching state dict #32561

Closed
2 of 4 tasks
SamPruden opened this issue Aug 9, 2024 · 5 comments · Fixed by #32566
Closed
2 of 4 tasks

Mamba2 Codestral generation example fails to load mismatching state dict #32561

SamPruden opened this issue Aug 9, 2024 · 5 comments · Fixed by #32566
Labels

Comments

@SamPruden
Copy link

SamPruden commented Aug 9, 2024

System Info

Google Colab, transformers 4.42.4 (default Colab version) and 4.44.0 (after --upgrade)

  • transformers version: 4.42.4
  • Platform: Linux-6.1.85+-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.23.5
  • Safetensors version: 0.4.4
  • Accelerate version: 0.32.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.1+cu121 (False)
  • Tensorflow version (GPU?): 2.17.0 (False)
  • Flax version (CPU?/GPU?/TPU?): 0.8.4 (cpu)
  • Jax version: 0.4.26
  • JaxLib version: 0.4.26
  • Using distributed or parallel set-up in script?:

Who can help?

@ArthurZucker @molbap

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Copied directly from the documentation.

from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch
model_id = 'mistralai/Mamba-Codestral-7B-v0.1'
tokenizer = AutoTokenizer.from_pretrained(model_id, revision='refs/pr/9', from_slow=True, legacy=False)
model = MambaForCausalLM.from_pretrained(model_id, revision='refs/pr/9')
input_ids = tokenizer("Hey how are you doing?", return_tensors= "pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

It first warns You are using a model of type mamba2 to instantiate a model of type mamba. This is not supported for all configurations of models and can yield errors.

It then errors with

RuntimeError: Error(s) in loading state_dict for MambaForCausalLM:
	size mismatch for backbone.layers.0.mixer.A_log: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([8192, 128]).
	size mismatch for backbone.layers.0.mixer.D: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([8192]).
	size mismatch for backbone.layers.0.mixer.conv1d.weight: copying a param with shape torch.Size([10240, 1, 4]) from checkpoint, the shape in current model is torch.Size([8192, 1, 4]).
	size mismatch for backbone.layers.0.mixer.conv1d.bias: copying a param with shape torch.Size([10240]) from checkpoint, the shape in current model is torch.Size([8192]).
	size mismatch for backbone.layers.0.mixer.in_proj.weight: copying a param with shape torch.Size([18560, 4096]) from checkpoint, the shape in current model is torch.Size([16384, 4096]).

etc. for all layers.

Expected behavior

It should work as documented.

@SamPruden SamPruden added the bug label Aug 9, 2024
@SamPruden SamPruden changed the title Mamba2 generation example fails to load mismatching state dict Mamba2 Codestral generation example fails to load mismatching state dict Aug 9, 2024
@molbap
Copy link
Contributor

molbap commented Aug 9, 2024

Thanks for the issue, taking a look!

@molbap
Copy link
Contributor

molbap commented Aug 9, 2024

You are using MambaForCausalLM - it should be Mamba2ForCausalLM there! Codestral is based on Mamba-2, not Mamba.

@SamPruden
Copy link
Author

You are using MambaForCausalLM - it should be Mamba2ForCausalLM there! Codestral is based on Mamba-2, not Mamba.

It's copied directly from the docs on the site, which I suppose makes this a documentation error. I suspected it would be something this simple, but I was just doing a very quick test out of curiosity and didn't have time to dig into it immediately.

@SamPruden
Copy link
Author

I thought I remembered checking that actually so I just took another look at my test notebook. It appears that Mamba2 isn't available on Colab without a pip install transformers --upgrade yet which confused my quick hacky checking.

@molbap
Copy link
Contributor

molbap commented Aug 9, 2024

No worries, you're right, I'll update the docs right away! For colab yes, it should be available soon though :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants