Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added SVG for Groq model model providers #1470

Merged
merged 2 commits into from
Jul 12, 2024

Conversation

paresh2806
Copy link
Contributor

#1432 #1447
This PR adds support for the GROQ LLM (Large Language Model).

Groq is an AI solutions company delivering ultra-low latency inference with the first-ever LPU™ Inference Engine. The Groq API enables developers to integrate state-of-the-art LLMs, such as Llama-2 and llama3-70b-8192, into low latency applications with the request limits specified below. Learn more at groq.com.
Supported Models

ID Requests per Minute Requests per Day Tokens per Minute
gemma-7b-it 30 14,400 15,000
gemma2-9b-it 30 14,400 15,000
llama3-70b-8192 30 14,400 6,000
llama3-8b-8192 30 14,400 30,000
mixtral-8x7b-32768 30 14,400 5,000

rag/llm/__init__.py Outdated Show resolved Hide resolved
rag/llm/chat_model.py Outdated Show resolved Hide resolved
api/db/init_data.py Outdated Show resolved Hide resolved
api/db/init_data.py Outdated Show resolved Hide resolved
api/db/init_data.py Outdated Show resolved Hide resolved
api/db/init_data.py Outdated Show resolved Hide resolved
api/db/init_data.py Outdated Show resolved Hide resolved
added suggested changes

Co-authored-by: Kevin Hu <[email protected]>
@KevinHuSh KevinHuSh merged commit ddeac9a into infiniflow:main Jul 12, 2024
1 check passed
@lionkingc
Copy link

please add the newest model :
Model ID: llama-3.1-70b-versatile
Model ID: llama-3.1-8b-instant

Halfknow pushed a commit to Halfknow/ragflow that referenced this pull request Nov 11, 2024
infiniflow#1432  infiniflow#1447 
This PR adds support for the GROQ LLM (Large Language Model).

Groq is an AI solutions company delivering ultra-low latency inference
with the first-ever LPU™ Inference Engine. The Groq API enables
developers to integrate state-of-the-art LLMs, such as Llama-2 and
llama3-70b-8192, into low latency applications with the request limits
specified below. Learn more at [groq.com](https://groq.com/).
Supported Models


| ID | Requests per Minute | Requests per Day | Tokens per Minute |

|----------------------|---------------------|------------------|-------------------|
| gemma-7b-it | 30 | 14,400 | 15,000 |
| gemma2-9b-it | 30 | 14,400 | 15,000 |
| llama3-70b-8192 | 30 | 14,400 | 6,000 |
| llama3-8b-8192 | 30 | 14,400 | 30,000 |
| mixtral-8x7b-32768 | 30 | 14,400 | 5,000 |

---------

Co-authored-by: paresh0628 <[email protected]>
Co-authored-by: Kevin Hu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants