Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAG app example #118

Open
wants to merge 94 commits into
base: main
Choose a base branch
from
Open

RAG app example #118

wants to merge 94 commits into from

Conversation

heyjustinai
Copy link
Member

@heyjustinai heyjustinai commented Nov 18, 2024

What does this PR do?

Creating a E2E RAG example that is able to do retrieval on documents and answer user questions. Components included:

Inference (with llama-stack)
Memory (with llama-stack)
Agent (with llama-stack)
Frontend (with Gradio)

Feature/Issue validation/testing/test plan

1120.mov

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Thanks for contributing 🎉!

@heyjustinai heyjustinai marked this pull request as ready for review November 20, 2024 21:11
@heyjustinai heyjustinai changed the title [WIP] Rag app RAG app example Nov 20, 2024
examples/agents/rag_with_memory_bank.py Outdated Show resolved Hide resolved
@@ -0,0 +1,181 @@
import argparse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should simply call this directory DocQA with the organization:

  • DocQA
    • app.py
    • README.md
    • scripts/
    • data/

Could we also avoid prefixing files 01_ or 02_ etc.


### How to run the pipeline:

![RAG_workflow](./RAG_workflow.jpg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance we can simplify this diagram a lot? actually, I think a simpler inline Mermaid diagram which shows just the basic high-level flow would be more useful. docker, etc. should be completely avoided.

}
```

2. Inside of docker folder, `run_RAG.sh` is the main script that can create `.env` file for compose.yaml and then actually start the `docker compose` process to launch all the pipelines in our dockers. `compose.yaml` is the main docker yaml that specifies all the mount option and docker configs, change the mounts if needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these details are unnecessary I think. these scripts are very simple and self-documenting in a way.


4. ChromaDB docker will also start. This docker will host the chroma database that can interact with llama-stack.

5. Lastly, Llama-stack docker will start. The `llama_stack_start.sh` control the docker startup behavior, change it if needed. It will first run the ingestion pipeline to convert all the documents into MarkDown files. Then, it will run llama-stack server based on the `llama_stack_run.yaml` config. Once the server is ready, then it will run the `gradio_interface.py` which will insert document chunks into memory_bank and start the UI for user interaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have scripts within scripts, is it possible to inline some of them perhaps?

examples/E2E-RAG-App/gradio_interface.py Outdated Show resolved Hide resolved
],
"query_generator_config": {"type": "default", "sep": " "},
"max_tokens_in_context": 300,
"max_chunks": 5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just elide this and leave it to default. because we don't want people to be thinking about all these pieces when they first look at the stack (and even later if we do a good job)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, will leave out "query_generator_config": {"type": "default", "sep": " "},

but since we are running ollama locally, we will need to keep the max_tokens_in_context and max_chunks for it to run at resonable speed

examples/E2E-RAG-App/gradio_interface.py Outdated Show resolved Hide resolved
examples/E2E-RAG-App/gradio_interface.py Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants