-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Welcome to the llm-prototypes wiki!
In the past three months I have been working for Mayor’s Office of New Urban Mechanics (MONUM) at Boston on a web application that uses retrieval-augmented generative AI for text retrieval and Q&A. The project is part of Google’s Summer of Code program, where contributors work with organizations on open-source projects. By the end of the program, I was able build a prototype that is capable of answering general public inquiries for government related questions.
The code of this project is posted on this github repo's code section. The current working branches are “main” and “azure-free”. The “client” and “server” folders contain the code for respective parts of the project. The current prototype uses a fine-tuned LLM model with a customized knowledge base of multi-format government-related files.
The client side is a React app that enables the user to upload files and tag them with relevant themes, organization that they belong to, and add a short description. All original texts in the file, along with all metadata, are used by Azure Cognitive Search when considering relevant files.
The server side is a Flask app that provides multiple APIs for file upload, retrieval, and generating LLM responses. For example, the “query” API takes the question that the user submitted on the frontend as a query parameter; it first fetches the top 3-5 relevant files from Azure Cognitive Search in vector form (more on this later), then it queries the language model to generate a response based on the retrieved files and their metadata.
Each uploaded file is stored in two places. First, the file is stored in Azure Blob Storage. This enables file download by generating a cloud storage URL specific to each file. Second, each file is turned to vector form and stored in Azure Cognitive Search. This enables Cognitive Search to quickly calculate vector distance and determine which files are semantically closer to the user query.
A separate repo contains a personal side project – a Chrome extension “Shepherd” for augmented text retrieval. The app scrapes the content on the current web page using Cheerio, and convert it to vector form, which was then used as the data for LLM during text retrieval and Q&A. The user can connect to their OpenAI account by entering their OpenAI API key. This information will persist through the browser session and auto-filled upon opening the extension to reduce repeated login. In addition, user is given the power to customize their Shepherd with any GPT-3.5 model. They can change the model as well as reset their API key any time by going to the settings page. User can also hide chat history; with one click they can choose to see only the last Q&A. This feature was developed to give a clean and less clustered user interface.