-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add example about serving it through an API using fastapi
- Loading branch information
1 parent
331f19f
commit 8d3f87e
Showing
3 changed files
with
383 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,373 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "6ec042b3", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"sidebar_position: 5\n", | ||
"---\n", | ||
"\n", | ||
"# Serve with FastAPI for deploying\n", | ||
"\n", | ||
"Building an LLM bot and playing with it locally is simple enough, however, at some point you will want to put this bot in production, generally serving with through an API that the frontend can talk to. In this example, we are going to reuse the [Simple Bot with Weather Tool](./weather-bot) example, but serve it through a FastAPI endpoint.\n", | ||
"\n", | ||
"If you are not using FastAPI but something like Flask or Quartz, don't worry, the example should end up working pretty similar, we are just demonstrating it in FastAPI here because it's the more popular async-native alternative.\n", | ||
"\n", | ||
"First we are going to reimplement the same bot code, with a slighly difference on the memory class, as by serving multiple users with FastAPI, we are going to need to have one memory history per user. It follows:\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "2d41fd5d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import json\n", | ||
"from typing import AsyncGenerator, List, Literal\n", | ||
"from litechain import Chain, debug, as_async_generator\n", | ||
"\n", | ||
"from litechain.contrib.llms.open_ai import (\n", | ||
" OpenAIChatChain,\n", | ||
" OpenAIChatDelta,\n", | ||
" OpenAIChatMessage,\n", | ||
")\n", | ||
"\n", | ||
"\n", | ||
"class Memory:\n", | ||
" history: List[OpenAIChatMessage]\n", | ||
"\n", | ||
" def __init__(self) -> None:\n", | ||
" self.history = []\n", | ||
"\n", | ||
" def save_message(self, message: OpenAIChatMessage) -> OpenAIChatMessage:\n", | ||
" self.history.append(message)\n", | ||
" return message\n", | ||
"\n", | ||
" def update_delta(self, delta: OpenAIChatDelta) -> OpenAIChatDelta:\n", | ||
" if self.history[-1].role != delta.role and delta.role is not None:\n", | ||
" self.history.append(\n", | ||
" OpenAIChatMessage(\n", | ||
" role=delta.role, content=delta.content, name=delta.name\n", | ||
" )\n", | ||
" )\n", | ||
" else:\n", | ||
" self.history[-1].content += delta.content\n", | ||
" return delta\n", | ||
"\n", | ||
"\n", | ||
"def get_current_weather(\n", | ||
" location: str, format: Literal[\"celsius\", \"fahrenheit\"] = \"celsius\"\n", | ||
") -> OpenAIChatDelta:\n", | ||
" result = {\n", | ||
" \"location\": location,\n", | ||
" \"forecast\": \"sunny\",\n", | ||
" \"temperature\": \"25 C\" if format == \"celsius\" else \"77 F\",\n", | ||
" }\n", | ||
"\n", | ||
" return OpenAIChatDelta(\n", | ||
" role=\"function\", name=\"get_current_weather\", content=json.dumps(result)\n", | ||
" )\n", | ||
"\n", | ||
"\n", | ||
"# Chain Definitions\n", | ||
"\n", | ||
"\n", | ||
"def weather_bot(memory):\n", | ||
" weather_chain = (\n", | ||
" OpenAIChatChain[str, OpenAIChatDelta](\n", | ||
" \"WeatherChain\",\n", | ||
" lambda user_input: [\n", | ||
" *memory.history,\n", | ||
" memory.save_message(\n", | ||
" OpenAIChatMessage(role=\"user\", content=user_input),\n", | ||
" ),\n", | ||
" ],\n", | ||
" model=\"gpt-3.5-turbo-0613\",\n", | ||
" functions=[\n", | ||
" {\n", | ||
" \"name\": \"get_current_weather\",\n", | ||
" \"description\": \"Gets the current weather in a given location, use this function for any questions related to the weather\",\n", | ||
" \"parameters\": {\n", | ||
" \"type\": \"object\",\n", | ||
" \"properties\": {\n", | ||
" \"location\": {\n", | ||
" \"description\": \"The city to get the weather, e.g. San Francisco. Get the location from user messages\",\n", | ||
" \"type\": \"string\",\n", | ||
" },\n", | ||
" \"format\": {\n", | ||
" \"description\": \"A string with the full content of what the given role said\",\n", | ||
" \"type\": \"string\",\n", | ||
" \"enum\": (\"celsius\", \"fahrenheit\"),\n", | ||
" },\n", | ||
" },\n", | ||
" \"required\": [\"location\"],\n", | ||
" },\n", | ||
" }\n", | ||
" ],\n", | ||
" temperature=0,\n", | ||
" )\n", | ||
" .map(\n", | ||
" # Call the function if the model produced a function call by parsing the json arguments\n", | ||
" lambda delta: get_current_weather(**json.loads(delta.content))\n", | ||
" if delta.role == \"function\" and delta.name == \"get_current_weather\"\n", | ||
" else delta\n", | ||
" )\n", | ||
" .map(memory.update_delta)\n", | ||
" )\n", | ||
"\n", | ||
" function_reply_chain = OpenAIChatChain[None, OpenAIChatDelta](\n", | ||
" \"FunctionReplyChain\",\n", | ||
" lambda _: memory.history,\n", | ||
" model=\"gpt-3.5-turbo-0613\",\n", | ||
" temperature=0,\n", | ||
" )\n", | ||
"\n", | ||
" async def reply_function_call(stream: AsyncGenerator[OpenAIChatDelta, None]):\n", | ||
" async for output in stream:\n", | ||
" if output.role == \"function\":\n", | ||
" async for output in function_reply_chain(None):\n", | ||
" yield output\n", | ||
" else:\n", | ||
" yield output\n", | ||
"\n", | ||
" weather_bot: Chain[str, str] = (\n", | ||
" weather_chain.pipe(reply_function_call)\n", | ||
" .map(memory.update_delta)\n", | ||
" .map(lambda delta: delta.content)\n", | ||
" )\n", | ||
"\n", | ||
" return weather_bot" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f7a4b327", | ||
"metadata": {}, | ||
"source": [ | ||
"Now we are going to create a FastAPI endpoint, which takes a user message, stores its history on the \"database\", and call the bot, returning the streamed answer from it:\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "4ca9d5d6", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from typing import Dict\n", | ||
"from fastapi import FastAPI, Request\n", | ||
"from fastapi.responses import StreamingResponse\n", | ||
"from litechain import filter_final_output\n", | ||
"\n", | ||
"app = FastAPI()\n", | ||
"\n", | ||
"\n", | ||
"in_memory_database: Dict[str, Memory] = {}\n", | ||
"\n", | ||
"\n", | ||
"@app.post(\"/chat\")\n", | ||
"async def chat(request: Request):\n", | ||
" params = await request.json()\n", | ||
" user_input = params.get(\"input\")\n", | ||
" user_id = params.get(\"user_id\")\n", | ||
"\n", | ||
" if user_id not in in_memory_database:\n", | ||
" in_memory_database[user_id] = Memory()\n", | ||
"\n", | ||
" bot = weather_bot(in_memory_database[user_id])\n", | ||
" output_stream = filter_final_output(bot(user_input))\n", | ||
"\n", | ||
" return StreamingResponse(output_stream, media_type=\"text/plain\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b9c830e2", | ||
"metadata": {}, | ||
"source": [ | ||
"And then start the server to make some requests to it:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "731c90c0", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"INFO: Started server process [96682]\n", | ||
"INFO: Waiting for application startup.\n", | ||
"INFO: Application startup complete.\n", | ||
"INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from uvicorn import Config, Server\n", | ||
"import threading\n", | ||
"\n", | ||
"config = Config(app=app, host=\"0.0.0.0\", port=8000)\n", | ||
"server = Server(config)\n", | ||
"\n", | ||
"threading.Thread(target=server.run).start()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"id": "4a44fb3f", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"INFO: 127.0.0.1:59288 - \"POST /chat HTTP/1.1\" 200 OK\n", | ||
"Hello! How can I assist you today?" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"!curl -X POST http://localhost:8000/chat \\\n", | ||
" -H \"Content-Type: application/json\" \\\n", | ||
" -d '{\"user_id\":\"1\", \"input\":\"hi there\"}'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "d4c1b32a", | ||
"metadata": {}, | ||
"source": [ | ||
"It's alive! It replies" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"id": "5871746f", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"INFO: 127.0.0.1:59291 - \"POST /chat HTTP/1.1\" 200 OK\n", | ||
"According to the current weather information, it is sunny in Amsterdam with a temperature of 25°C. Enjoy your trip to Amsterdam!" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"!curl -X POST http://localhost:8000/chat \\\n", | ||
" -H \"Content-Type: application/json\" \\\n", | ||
" -d '{\"user_id\":\"1\", \"input\":\"I am traveling to Amsterdam, is it hot today there?\"}'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7f73c762", | ||
"metadata": {}, | ||
"source": [ | ||
"Cool, and we can ask the bot about the weather too! Now, does it remember the last thing I said?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"id": "358c03b7", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"INFO: 127.0.0.1:59296 - \"POST /chat HTTP/1.1\" 200 OK\n", | ||
"You mentioned that you are traveling to Amsterdam." | ||
] | ||
} | ||
], | ||
"source": [ | ||
"!curl -X POST http://localhost:8000/chat \\\n", | ||
" -H \"Content-Type: application/json\" \\\n", | ||
" -d '{\"user_id\":\"1\", \"input\":\"where am I traveling to?\"}'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "2a8c138d", | ||
"metadata": {}, | ||
"source": [ | ||
"Yes it does! What if it were a different `user_id` talking to it, would it have access to the chat history as well?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"id": "652e0194", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"INFO: 127.0.0.1:59298 - \"POST /chat HTTP/1.1\" 200 OK\n", | ||
"I'm sorry, but as an AI assistant, I don't have access to personal information about users unless it is shared with me in the course of our conversation. Therefore, I don't know where you are traveling to." | ||
] | ||
} | ||
], | ||
"source": [ | ||
"!curl -X POST http://localhost:8000/chat \\\n", | ||
" -H \"Content-Type: application/json\" \\\n", | ||
" -d '{\"user_id\":\"2\", \"input\":\"where am I traveling to?\"}'" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "862136fe", | ||
"metadata": {}, | ||
"source": [ | ||
"No it doesn't, users have separate chat histories.\n", | ||
"\n", | ||
"That's it, we now have a fully working API that calls our bot and answer questions about the weather, while also keeping the conversation history on the \"database\" separate per user, allowing to serve many users at the same time.\n", | ||
"\n", | ||
"I hope this helps you getting your LLM bot into production! If there is something you don't understand in this example, [join our discord channel](https://discord.gg/48ZM5KkKgw) to get help from the community!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8e82d862", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
FastAPI | ||
openai_function_call | ||
unstructured | ||
chromadb==0.4.3 |