-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: add ChatQnA deploy on xeon example #104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's see the final effect. thanks
Show how to merge the vllm and TGI example into one with tabbed content for the differences. Signed-off-by: David B. Kinder <[email protected]>
@mkbhanda @hshen14 @preethivenkatesh I still need another "green" checkmark reviewer... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More suggestions ... sorry if it feels like nit picking.
slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will | ||
be covering one option of doing it for convenience : we will be showcasing how | ||
to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, | ||
deployed on IDC. For more information on how to setup IDC instance to proceed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spirit of refactor/re-use, should we have kept the where part in a separate document - IDC or oa desktop/server or VM elsewhere?
Also, the sentence is grammatically incorrect.
be covering one option of doing it for convenience : we will be showcasing how | ||
to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, | ||
deployed on IDC. For more information on how to setup IDC instance to proceed, | ||
Please follow the instructions here (*** getting started section***). If you do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought "Please" is stylistically frowned upon.
|
||
## Prerequisites | ||
|
||
First step is to clone the GenAIExamples and GenAIComps. GenAIComps are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have suggested using the pre-built docker images on Docker Hub instead of the build instructions here? Stick to V 0.9 tag or be future proof using latest tag.
git checkout tags/v0.9 | ||
``` | ||
|
||
The examples utilize model weights from HuggingFace and langchain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LangChain
|
||
## Prepare (Building / Pulling) Docker images | ||
|
||
This step will involve building/pulling ( maybe in future) relevant docker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future has become present :-)
{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true} | ||
|
||
``` | ||
You may notice reranking microservice are with state ('ID' and other meta data), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear what the message is?
"max_tokens": 32, "temperature": 0}' | ||
``` | ||
|
||
vLLM service generate text for the input prompt. Here is the expected result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vLLM service generates ...
{"generated_text":"We have all heard the buzzword, but our understanding of it is still growing. It’s a sub-field of Machine Learning, and it’s the cornerstone of today’s Machine Learning breakthroughs.\n\nDeep Learning makes machines act more like humans through their ability to generalize from very large"} | ||
``` | ||
|
||
**NOTE**: After launch the vLLM, it takes few minutes for vLLM server to load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After launching the vLLM service it takes a few minutes for the vLLM server to load
|
||
``` | ||
|
||
TGI service generate text for the input prompt. Here is the expected result from TGI: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are eating up articles like "The", generates
|
||
``` | ||
|
||
and the log shows model warm up, please wait for a while and try it later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/try it later/retry.
Sorry @dbkinder did not get to this PR earlier prior to merge. |
Show how to merge the vllm and TGI example into one with tabbed content for the differences.