-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for mistral type Model to use Mistral and Zephyr #1553
Comments
I tried looking various listed issues around as well and seems like its been unaddressed for more than a month. I was thinking of finally adding support for mistral architecture on my own , even though I dont know much about it . |
Thank you very much! Mistral 7B is top model which out performed Llama 13B in few cases. Zephyr-7b-beta from Hugging Face which is finetuned from Mistral is the best one which even beats Llama 70B in few cases. Adding support for Mistral will open up Mistral and zephyr model. Thanks for the link for contribution. |
Btw I dont think pursuing performance improvements using airllm is worth it , I tried it with a 34B param model and its really really slow on my 8GB card , the bottleneck is gonna be the processing power. A quantized model loaded straight into card is better imo |
Thanks for the update! I think update may not be required until model becomes faster! |
Hi @manjunathshiva, in Transformers 4.36 release we started adding native As for decoder models we do not use nested tensors and simply rely on SDPA, let's add this directly in Transformers. I opened the issue huggingface/transformers#28005 in Transformers to track the support. Please continue the discussion there! |
Hi, BetterTransformer support Mistral? or Solar Mistral? Regards |
any updates on this is BetterTransformer support Mistral? |
Hi @jesulo @pradeepdev-1995, BetterTransformer optimization for Mistral (which in our case is simply calling PyTorch's SDPA op instead of manual attention) has been integrated in Transformers natively, see https://huggingface.co/docs/transformers/v4.37.0/en/perf_infer_gpu_one#bettertransformer and https://huggingface.co/docs/transformers/v4.37.0/en/perf_infer_gpu_one#pytorch-scaled-dot-product-attention, as long as you use torch>=2.1.1. |
Feature request
Using airllm to used 4GB GPU for mistral type Model gives me below error
File "C:\model.py", line 5, in
model = AirLLMLlama2("./modles/zephyr-7b-beta")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\LLM\venv\Lib\site-packages\airllm\airllm.py", line 184, in init
self.init_model()
File "C:\LLM\venv\Lib\site-packages\airllm\airllm.py", line 197, in init_model
self.model = BetterTransformer.transform(self.model) # enable flash attention
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\305031856\AppData\Local\Programs\Python\Python311\Lib\contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "C:\LLM\venv\Lib\site-packages\optimum\bettertransformer\transformation.py", line 228, in transform
raise NotImplementedError(
NotImplementedError: The model type mistral is not yet supported to be used with BetterTransformer. Feel free to open an issue at https://github.com/huggingface/optimum/issues if you would like this model type to be supported. Currently supported models are: dict_keys(['albert', 'bark', 'bart', 'bert', 'bert-generation', 'blenderbot', 'bloom', 'camembert', 'blip-2', 'clip', 'codegen', 'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'falcon', 'gpt2', 'gpt_bigcode', 'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm', 'llama', 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 'prophetnet', 'roberta', 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', 'whisper', 'xlm-roberta', 'yolos']).
Motivation
Zephyr is currently the leading model in Hugging Face so support is very much needed !
Your contribution
Yes I can help if any help needed! Am a Senior Software Engineer with 17 years of Industry experience,.
The text was updated successfully, but these errors were encountered: