Add [`Mamba`] model #28086

JLTastet · 2023-12-15T18:43:49Z

Model description

Mamba is a new architecture proposed in arXiv:2312.00752 by Albert Gu (CMU) and Tri Dao (Princeton).

It is inspired by structured state space models (SSMs), but with the addition of a selection mechanism that allows it to combines the ability of transformers to perform content-based reasoning with the performance of SSMs on long sequences. Mamba can be efficiently trained in parallel while also enjoying efficient inference by running recurrently.

The paper claims SoTA performance on various modalities, with performance tested up to 2.8B parameters. Crucially, the model cannot be implemented efficiently using only PyTorch operations; instead, it relies on optimised CUDA and triton kernels.

The original implementation by the authors is available at https://github.com/state-spaces/mamba/tree/main under an Apache 2.0 license.

Starting from their implementation, I have started porting the model to 🤗 Transformers. This is work in progress 🚧, and can be found in my fork at https://github.com/JLTastet/transformers/tree/mamba.

I can open a PR, but in its current state my branch is not ready to be merged. I will also open an issue in the original repo to let the authors know about this, in case they want to chime in.

What I got working:

Forward and backward passes.
Loading checkpoints from the Hub using AutoModel.

What still needs some work:

Even though backprop itself works, I get some CUDA errors when using Trainer, and I still don’t understand what causes them.
Compiling the CUDA kernels takes ~1 hour. This does not happen with the original package, so I think they are using prebuilt binaries. I didn’t manage to port that part so far.
I don’t think there is any non-CUDA fallback path, so this model probably cannot run without CUDA in its current form.
When using generate, we should check that the optimised recurrent inference is used instead of the slower autoregressive inference.
Tests, tests and moar tests.
Most of the documentation needs to be written.
Add the relevant dependencies.
The code could certainly benefit from some cleanup (remove dead code, many TODO’s, update copyright notices, ...).

I am opening this issue to avoid duplicating work, since I saw some mention of Mamba today by @ArthurZucker.

My main motivation for porting this model is to learn a bit more about it (and about the internals of 🤗 Transformers) and to run more evals. Some of you probably know this library much better than me, so feel free to write your own implementation if you can do it better or quicker. Otherwise, don’t hesitate to build on top of my fork.

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Paper: https://arxiv.org/abs/2312.00752 by @albertfgu and @tridao.
Original repo by the authors: https://github.com/state-spaces/mamba/tree/main
My WIP implementation in 🤗 Transformers: https://github.com/JLTastet/transformers/tree/mamba

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2023-12-18T10:21:51Z

Thanks for opening this issue! Given the sensitivity of this model, the HF team will take it over, we'll have a look at your fork and add you as a co-other 🤗

JLTastet · 2023-12-18T12:05:22Z

Thanks a lot!

My fork is largely inspired from the original Mamba repo, the differences mostly consisting in boilerplate code. So don’t hesitate to start from the upstream repo.

I (and the linter) have noticed a couple of bugs or pieces of dead code in the upstream (some of which remain in my fork). So keep an eye for them!

LegallyCoder · 2023-12-24T08:20:43Z

I did a similar study https://github.com/LegallyCoder/mamba-hf .
I'm working on this too.

ankhzet · 2024-01-16T08:19:03Z

I've seen a CPU only implementation fork mentioned somewhere in the source repo issues. The author of the fork removed Triton and CUDA dependencies.

Found it: https://github.com/kroggen/mamba-cpu
Training is not working there, tho. Maybe you can get in touch with the author.

JLTastet added the New model label Dec 15, 2023

JLTastet mentioned this issue Dec 15, 2023

Submit implementation to HuggingFace Transformers library state-spaces/mamba#60

Closed

ArthurZucker self-assigned this Dec 16, 2023

ArthurZucker linked a pull request Dec 16, 2023 that will close this issue

[Add Mamba] Adds support for the Mamba models #28094

Merged

5 tasks

ArthurZucker mentioned this issue Dec 16, 2023

[Add Mamba] Adds support for the Mamba models #28094

Merged

5 tasks

amyeroberts mentioned this issue Feb 21, 2024

Add Mamba model and Architecture. #29164

Closed

2 tasks

ArthurZucker closed this as completed in #28094 Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add [`Mamba`] model #28086

Add [`Mamba`] model #28086

JLTastet commented Dec 15, 2023

ArthurZucker commented Dec 18, 2023

JLTastet commented Dec 18, 2023

LegallyCoder commented Dec 24, 2023

ankhzet commented Jan 16, 2024 •

edited

Loading

Add [Mamba] model #28086

Add [Mamba] model #28086

Comments

JLTastet commented Dec 15, 2023

Model description

Open source status

Provide useful links for the implementation

ArthurZucker commented Dec 18, 2023

JLTastet commented Dec 18, 2023

LegallyCoder commented Dec 24, 2023

ankhzet commented Jan 16, 2024 • edited Loading

Add [`Mamba`] model #28086

Add [`Mamba`] model #28086

ankhzet commented Jan 16, 2024 •

edited

Loading