-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for LLaMA models #147
Comments
The models are not public yet, unfortunately. You have to request access. |
Psst. Somebody leaked them https://twitter.com/Teknium1/status/1631322496388722689 |
Of course, the weights themselves are closed. But the code from the repository should be enough to add support. And where the end users will download the weights from is their problem. |
Done ea5c5eb
|
Getting a CUDA out-of-memory error- I assume lowmem support isn't included yet? |
This isn't part of Hugging Face yet, so it doesn't have access to 8bit and CPU offloading. The 7B model uses 14963MiB VRAM on my machine. Reducing the max_seq_len parameter from 2048 to 512 makes this go down to 13843MiB. |
I get a bunch of dependency errors when launching despite setting up LLaMa beforehand (definitely my own fault and probably because of a messed up conda environment)
etc. Any chance you could include these in the default webui requirements assuming they aren't too heavy? |
@musicurgy did you try |
Yeah, after a bit of a struggle I ended up getting it working by just copying all the dependencies into the webui folder. So far the model is really interesting. Thanks for supporting it. |
Awesome stuff. I'm able to load LLaMA-7b but trying to load LLaMA-13b crashes with the error:
|
Anyone reading this you can get past the issue above by changing the world_size variable found in modules/LLaMA.py like this: def setup_model_parallel() -> Tuple[int, int]: My issue now is I'm running out of VRAM. I'm running dual 3090s and should be able to load the model if it's split among the cards... |
Is there a parameter I need to pass to oobabooga to tell it to split the model among my two 3090 gpus? |
Try |
Sorry super dumb but do I pass this to start-webui.sh? Like
|
Ah, that should work, but if not, edit the file and add this at the end of |
Thanks friend! I was able to get it with |
For LLaMA, the correct way is to change the global variables inside LLaMA.py like @generic-username0718 did, but I am not very familiar with the parameters yet. |
I was starting to question my sanity... I think I accidentally was loading opt-13b instead... Sorry if I got people's hopes up I'm still trying to split the model Edit: Looks like they've already asked this here: meta-llama/llama#88 |
bad news for the guys hoping to run 13B
|
LLaMA-7B can be run on CPU instead of GPU using this fork of the LLaMA repo: https://github.com/markasoftware/llama-cpu To quote the author "On a Ryzen 7900X, the 7B model is able to infer several words per second, quite a lot better than you'd expect!" |
I sure did. Also those os.environ don't seem to work. |
I also get the same error, with 13b. |
Anyone else getting really poor results on 7B? I've tried many prompts and parameter variations and it generally ends up as mostly nonsense with lots of repetition. It might just be the model but I saw some 7B output examples posted online that seemed way better than anything I was getting. |
Is it possible to reduce computation precision on CPU? Down to 8 bit? |
Someone made a fork of llama github that apparently runs in 8bit : Zero idea if it works or anything. |
I'm getting the following error when trying to run the 7B model on my rtx 3090, can someone help?
|
@hopto-dot Go here and run the pip command for the 11.7 build on your OS: https://pytorch.org/get-started/locally/ |
I'm getting the below error. I'm fairly certain I have the latest weights. ===================== CUDA SETUP: Loading binary C:\Users\PC\miniconda3\envs\textgen\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll... |
@gadaeus |
This helped, thank you |
Does anyone have a guess as to what may be causing this error?
|
See #445 (comment) |
does anybody know where i should start to go about fixing the below errors? running on a m1 max 64gb. Attempted with HFv2 Model Weights LLaMA-7B
|
It looks like you don't have enough memory to load the model. Try adding
(adjust the value based on your GPU size) |
@oobabooga there is a wrapper for llama.cpp now |
I've been following this tutorial trying to set up llama 7b and eventually 13b, and I'm running into an issue I can't find referenced anywhere online regarding the process, pasted below. I'm running Windows 10, and I've already attempted the fix detailed HERE. I've tried running the command in Anaconda Prompt (miniconda3) both normally and as an administrator, and I get the same error. I even tried running it in normal CMD which didn't lead anywhere. Let me know if I'm posting this in the wrong place, formatting incorrectly, or committing some other faux pas - I can't figure out how to do this clean looking 'snippet clipboard content' formatting that people are using here.
(base) D:\AI Image Things\text-generation-webui> |
I recommend using the new one-click installer for native Windows installation https://github.com/oobabooga/text-generation-webui#one-click-installers |
I'll give that a shot! Thanks for the work you put in on all this - the accessibility of this wave of AI tech is seriously appreciated. |
@guruace HOW? I don't understand, what I am doing wrong. I get the following error.
When running |
@tillhanke this is an out of memory error. It means that not enough RAM and/or VRAM was found in the system to load the model. |
so, Llama works, i guess this issue can be closed 👏🏼 |
File "/home/yinan/text-generation-webui/modules/text_generation.py", line 31, in encode |
@Modplay you haven't downloaded or set the model yet |
Meta just released there LLaMA model family
https://github.com/facebookresearch/llama
Can we got support for that?
They calim that the 13B model is better than GPT 3 175B model
This is how to use the model in the web UI:
LLaMA TUTORIAL
-- oobabooga, March 9th, 2023
The text was updated successfully, but these errors were encountered: