Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: discussions around new features within HF ecosystem with unsloth #34

Closed
younesbelkada opened this issue Dec 14, 2023 · 4 comments
Labels
Discussion Questions or discussions

Comments

@younesbelkada
Copy link
Contributor

younesbelkada commented Dec 14, 2023

Hi @danielhanchen

Thank you very much for this great project and pushing this forward for the community !

With TRL / PEFT team we've seen that your example scripts heavily rely on PEFT / TRL libraries and we wanted to see if you need any help or have any feature request around HF ecosystem we would be happy to collaborate and see what we can do together

Note also recently SDPA has been integrated into transformers core huggingface/transformers#26572 we were also wondering if you did some comparisons with unsloth against transformers 4.36.0

cc @pacman100 @lvwerra

@younesbelkada younesbelkada changed the title Feature request: discussions around new features within HF ecosystem Feature request: discussions around new features within HF ecosystem with unsloth Dec 14, 2023
@danielhanchen
Copy link
Contributor

@younesbelkada Hey there! Saw many of ur PRs for HF - so great work again! So I actually saw SDPA support and I think I wrote a note in my benchmarks.

For eg: Alpaca with SDPA on Tesla T4 ie:

%%capture
# scaled_dot_product_attention added in 9th December 2023
# Supports Xformers, FA on old GPUs now (T4 for eg)
# But only for Pytorch 2.1.1+ We shall patch it ourselves for now
!pip install transformers bitsandbytes datasets sentencepiece accelerate trl peft

I manually patched them for SDPA, and so on Tesla T4s, I did in fact benchmark SDPA (not native transformers, but just SDPA).

Eg (The Flash Attention column is in fact SDPA)

1 T4 16GB Hugging Face Flash Attention Unsloth Open Unsloth Pro Equal Unsloth Pro Unsloth Max
Alpaca 1x 1.09x 1.69x 1.79x 2.93x 8.3x
code Code Code Code Code
seconds 1599 1468 942 894 545 193
memory MB 7199 7059 6459 5443
memory saved % 1.94 10.28 24.39

So vs SDPA, Unsloth is 1.56x faster on a Tesla T4.
I wanted to actually use the latest transformers branch, but Colab's Pytorch is 2.1.0, and upgrading it to 2.1.1 would be quite slow - I started benchmarking Dec 8 ish, then HF released SDPA support I think Dec 9?

@danielhanchen
Copy link
Contributor

But more than happy to collaborate on anything!! Again great work with TRL and PEFT! I'm actively following huggingface/transformers#26037 :) so that'll be massive for the next HF release!

I'm also investigating LoftQ via PEFT as suggested by someone I was chatting with - I haven't tried it yet, but hopefully VRAM doesn't explode!

@younesbelkada
Copy link
Contributor Author

Thanks very much for your positive reply @danielhanchen !
We can collaborate on many things, one thing I had in mind is to integrate an API that can leverage unsloth as backend on PEFT. It will be easier for us if we can discuss about it on slack, can you send me an email address I can reach you out at?

@danielhanchen
Copy link
Contributor

@younesbelkada Email is on my profile! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Questions or discussions
Projects
None yet
Development

No branches or pull requests

2 participants