Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: add ChatQnA deploy on xeon example #104

Merged
merged 1 commit into from
Sep 12, 2024

Conversation

dbkinder
Copy link
Contributor

Show how to merge the vllm and TGI example into one with tabbed content for the differences.

Copy link
Collaborator

@tomlenth tomlenth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@yinghu5 yinghu5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's see the final effect. thanks

examples/ChatQnA/deploy/xeon.md Outdated Show resolved Hide resolved
examples/ChatQnA/deploy/xeon.md Outdated Show resolved Hide resolved
Show how to merge the vllm and TGI example into one with tabbed content
for the differences.

Signed-off-by: David B. Kinder <[email protected]>
@dbkinder
Copy link
Contributor Author

@mkbhanda @hshen14 @preethivenkatesh I still need another "green" checkmark reviewer...

@dbkinder dbkinder merged commit b08d88f into opea-project:main Sep 12, 2024
1 check passed
Copy link
Collaborator

@mkbhanda mkbhanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More suggestions ... sorry if it feels like nit picking.

slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will
be covering one option of doing it for convenience : we will be showcasing how
to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model,
deployed on IDC. For more information on how to setup IDC instance to proceed,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of refactor/re-use, should we have kept the where part in a separate document - IDC or oa desktop/server or VM elsewhere?

Also, the sentence is grammatically incorrect.

be covering one option of doing it for convenience : we will be showcasing how
to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model,
deployed on IDC. For more information on how to setup IDC instance to proceed,
Please follow the instructions here (*** getting started section***). If you do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought "Please" is stylistically frowned upon.


## Prerequisites

First step is to clone the GenAIExamples and GenAIComps. GenAIComps are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have suggested using the pre-built docker images on Docker Hub instead of the build instructions here? Stick to V 0.9 tag or be future proof using latest tag.

git checkout tags/v0.9
```

The examples utilize model weights from HuggingFace and langchain.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LangChain


## Prepare (Building / Pulling) Docker images

This step will involve building/pulling ( maybe in future) relevant docker
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future has become present :-)

{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}

```
You may notice reranking microservice are with state ('ID' and other meta data),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear what the message is?

"max_tokens": 32, "temperature": 0}'
```

vLLM service generate text for the input prompt. Here is the expected result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vLLM service generates ...

{"generated_text":"We have all heard the buzzword, but our understanding of it is still growing. It’s a sub-field of Machine Learning, and it’s the cornerstone of today’s Machine Learning breakthroughs.\n\nDeep Learning makes machines act more like humans through their ability to generalize from very large"}
```

**NOTE**: After launch the vLLM, it takes few minutes for vLLM server to load
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After launching the vLLM service it takes a few minutes for the vLLM server to load


```

TGI service generate text for the input prompt. Here is the expected result from TGI:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are eating up articles like "The", generates


```

and the log shows model warm up, please wait for a while and try it later.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/try it later/retry.

@mkbhanda
Copy link
Collaborator

Sorry @dbkinder did not get to this PR earlier prior to merge.

@dbkinder dbkinder deleted the chatqna10 branch September 25, 2024 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants