Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recommendation for model in readme for newcomers #410

Closed
security-companion opened this issue Jun 10, 2023 · 13 comments
Closed

Add recommendation for model in readme for newcomers #410

security-companion opened this issue Jun 10, 2023 · 13 comments

Comments

@security-companion
Copy link
Contributor

Hi,
I think it would be good to have a recommendation for a model in the readme for starting with.
Otherwise newcomers to the project might not know which model is best for starting with.
What do you think.
security-companion

@gaby
Copy link
Member

gaby commented Jun 10, 2023

@security-companion Sounds like a great idea, can you do a PR? Another option could be to split the models list on the README

@security-companion
Copy link
Contributor Author

For sure I can make a PR.
Which model would you recommend for starting?

@raezor117
Copy link

I would personally go with Vicuna-v1.1-7B for lower end spec instances (5GB Ram) or else the Vicuna-v1.1-13B for better spec instances (12GB Ram)

@noproto
Copy link
Contributor

noproto commented Jun 11, 2023

@raezor117 Separating the recommendation based on the system specs is a good idea, but Vicuna has been surpassed for some time now: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

@gaby
Copy link
Member

gaby commented Jun 11, 2023

@noproto Agree, that gave me an idea to make the Models list filter out any model that requires more RAM than the one reported by the system.

For example, if your host had 8GB, there's no point in listing models that require 20,30,40GB of ram

@security-companion
Copy link
Contributor Author

@gaby: That's a good idea. Perhaps we could even make a switch-on/off button that says (Show/Filter only models that are supported on my machine)

@oktaborg
Copy link

@gaby: That's a good idea. Perhaps we could even make a switch-on/off button that says (Show/Filter only models that are supported on my machine)

I agree with this. And as previous person said. I also thing it would be good to have a "speed" ranking since this is the major problem for me.

A question, I cant find a sweet spot for amount of threads, is there any "best" amount?I tried up to 32.

Last but not least, is there a GPU support yet?

@raezor117
Copy link

@gaby: That's a good idea. Perhaps we could even make a switch-on/off button that says (Show/Filter only models that are supported on my machine)

I agree with this. And as previous person said. I also thing it would be good to have a "speed" ranking since this is the major problem for me.

A question, I cant find a sweet spot for amount of threads, is there any "best" amount?I tried up to 32.

Last but not least, is there a GPU support yet?

I have an instance also spun up on my virtual server, the specs of it being:
CPU: 20 vCores (AMD EPYC™ Milan 7763 Base Clocks 2.45GHz with Max. 3.5GHz)
RAM: 16 GB (3200MHz)

with the following prompt:

Provide only the string that would best describe the Invoice Number in the following piece of text:
sample Invoice (4 Click to edit Billed To Your Client 1234 Clients Street City, California 90210 United States 1-888-123-8910
Date Issued 26/3/2021 Due Date 25/4/2021 YOUR COMPANY 1234 Your Street City, California 90210 United States 1-888-123-4567 Invoice Number Amount Due INV-00456 $1,699.48

And with the above specs the following models take the following time to respond with a result:

  • Vicuna-v1.1-7B-q6_K = 6.54 seconds
  • Vicuna-v1.1-13B = 12.68 seconds
  • Lazarus-30B = Timed out after 1m30s PS: I know my specs does not meet the recommended requirements

@security-companion
Copy link
Contributor Author

In order to finish this issue. What models should we add?
Should we go with the following?

I would personally go with Vicuna-v1.1-7B for lower end spec instances (5GB Ram) or else the Vicuna-v1.1-13B for better spec instances (12GB Ram)

@hassansf
Copy link

I have a Synology NAS RS1619xs+ and I downloaded GPT4All-13B. Then I typed a simple question which took over 11 mins and did not complete the answer. I had to close the chat and restart the container for the NAS to breath.

I agree, there should be a readme for newcomers with recommended models.

@security-companion
Copy link
Contributor Author

Any news about suggested models?
What do you suggest so that we can proceed with that pull request?

@raezor117
Copy link

I personally had the best experience with the following models:

  • Vicuna
  • Wizard
  • Alpaca

As for which specific. I would say, start with the minimum of 7B. Then if your hardware is capable of more, you can try the larger models. However, don't try to jump to the 65B (my opinion). Work your way up until you find an acceptable response time for the number of tokens.

I personally will be looking into Orca next, as I have read that its quite a good competitor.

@gaby
Copy link
Member

gaby commented Nov 14, 2023

#866 introduces support for GGUF models. By default Serge will include the top downloaded ones. These are: LLaMA2, CodeLLaMA, Zephyr, ans Mistral

@gaby gaby closed this as completed Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants