diff --git a/Python/Mongo/AzureOpenAI - Mongo vCore - Ad Content Generation.ipynb b/Python/Mongo/AzureOpenAI - Mongo vCore - Ad Content Generation.ipynb new file mode 100644 index 0000000..13bbe68 --- /dev/null +++ b/Python/Mongo/AzureOpenAI - Mongo vCore - Ad Content Generation.ipynb @@ -0,0 +1,493 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Introduction\n", + "\n", + "Today we will demonstrate how you can generate advertizing content for your inventory to boost sales. We'll do this taking advantage of Azure Cosmos DB for Mongo DB vCore's [vector similarity search](https://learn.microsoft.com/azure/cosmos-db/mongodb/vcore/vector-search) functionality. We will use OpenAI embeddings to generate vectors for inventory description which expected to vastly enhance its semantics. The vectors are then stored and indexed in the Mongo vCore Database. During the content generation for the advertisement time we will also vectorize the advertisement topic and find matching inventory itmes. We will then use retrival augmented generation (RAG) by sending to top matches to OpenAI to generate a catchy advertisement." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Scenario\n", + "\n", + "1. Shoe Retailer who wants to sell more shoes \n", + "2. Wants to run advertisement to capitalize on recent trends\n", + "2. Wants to use generate advertisement content using the inventory items that matches the trend" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Azure OpenAI \n", + "\n", + "Finally, let's setup our Azure OpenAI resource Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access. Once you have access, complete the following steps:\n", + "\n", + "- Create an Azure OpenAI resource following this quickstart: https://learn.microsoft.com/azure/ai-services/openai/how-to/create-resource?pivots=web-portal\n", + "- Deploy a `completions` and `embeddings` model \n", + " - For more information on `completions`, go here: https://learn.microsoft.com/azure/ai-services/openai/how-to/completions\n", + " - For more information on `embeddings`, go here: https://learn.microsoft.com/azure/ai-services/openai/how-to/embeddings\n", + "- Copy the endpoint, key, deployment names for (embeddings model, completions model) into the config.json file." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Azure Cosmos DB for MongoDB vCore resource\n", + "Let's start by creating an Azure Cosmos DB for MongoDB vCore Resource following this quick start guide: https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/quickstart-portal\n", + "\n", + "Then copy the connection details (server, user, pwd) into the environment file." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Preliminaries \n", + "First, let's start by installing the packages that we'll need later. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "! pip install numpy\n", + "! pip install openai==1.2.3\n", + "! pip install pymongo\n", + "! pip install python-dotenv\n", + "! pip install azure-core\n", + "! pip install azure-cosmos\n", + "! pip install tenacity\n", + "! pip install gradio" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Please use the example.env as a template to provide the necessary keys and endpoints in your own .env file.\n", + "Make sure to modify the env_name accordingly. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "import time\n", + "import openai\n", + "\n", + "from dotenv import dotenv_values\n", + "from openai import AzureOpenAI\n", + "\n", + "# specify the name of the .env file name \n", + "env_name = \"adgen.env\" # following example.env template change to your own .env file name\n", + "config = dotenv_values(env_name)\n", + "\n", + "openai.api_type = config['openai_api_type']\n", + "openai.api_key = config['openai_api_key']\n", + "openai.api_base = config['openai_api_endpoint']\n", + "openai.api_version = config['openai_api_version']\n", + "\n", + "client = AzureOpenAI(\n", + " api_key=openai.api_key,\n", + " api_version=openai.api_version,\n", + " azure_endpoint = openai.api_base\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Create embeddings \n", + "Here we'll load a sample dataset containing descriptions of Azure services. Then we'll user Azure OpenAI to create vector embeddings from this data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def generate_embeddings(text):\n", + " '''\n", + " Generate embeddings from string of text.\n", + " This will be used to vectorize data and user input for interactions with Azure OpenAI.\n", + " '''\n", + " response = client.embeddings.create(\n", + " input=text, model=\"text-embedding-ada-002\")\n", + " print(response)\n", + " embeddings = response.data[0].embedding\n", + " time.sleep(0.5) # rest period to avoid rate limiting on AOAI for free tier\n", + " return embeddings\n", + "\n", + "generate_embeddings(\"Shoes for San Francisco summer\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Connect and setup Cosmos DB for MongoDB vCore" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up the connection" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pymongo\n", + "\n", + "env_name = \"adgen.env\"\n", + "config = dotenv_values(env_name)\n", + "\n", + "mongo_conn = config['mongo_vcore_connection_string']\n", + "mongo_client = pymongo.MongoClient(mongo_conn)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up the DB and collection" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DATABASE_NAME = \"AdgenDatabase\"\n", + "COLLECTION_NAME = \"AdgenCollection\"\n", + "\n", + "mongo_client.drop_database(DATABASE_NAME)\n", + "db = mongo_client[DATABASE_NAME]\n", + "collection = db[COLLECTION_NAME]\n", + "\n", + "if COLLECTION_NAME not in db.list_collection_names():\n", + " # Creates a unsharded collection that uses the DBs shared throughput\n", + " db.create_collection(COLLECTION_NAME)\n", + " print(\"Created collection '{}'.\\n\".format(COLLECTION_NAME))\n", + "else:\n", + " print(\"Using collection: '{}'.\\n\".format(COLLECTION_NAME))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create the vector index" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "db.command({\n", + " 'createIndexes': COLLECTION_NAME,\n", + " 'indexes': [\n", + " {\n", + " 'name': 'vectorSearchIndex',\n", + " 'key': {\n", + " \"contentVector\": \"cosmosSearch\"\n", + " },\n", + " 'cosmosSearchOptions': {\n", + " 'kind': 'vector-ivf',\n", + " 'numLists': 1,\n", + " 'similarity': 'COS',\n", + " 'dimensions': 1536\n", + " }\n", + " }\n", + " ]\n", + "});" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upload data to the collection\n", + "A simple `insert_many()` to insert our data in JSON format into the newly created DB and collection." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data_file = open(file=\"./data/shoes_with_vectors.json\", mode=\"r\") \n", + "data = json.load(data_file)\n", + "data_file.close()\n", + "\n", + "collection.insert_many(data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Vector Search in Cosmos DB for MongoDB vCore" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Function to assist with vector search\n", + "def vector_search(query, num_results=3):\n", + " \n", + " query_vector = generate_embeddings(query)\n", + "\n", + " embeddings_list = []\n", + " pipeline = [\n", + " {\n", + " '$search': {\n", + " \"cosmosSearch\": {\n", + " \"vector\": query_vector,\n", + " \"numLists\": 1,\n", + " \"path\": \"contentVector\",\n", + " \"k\": num_results\n", + " },\n", + " \"returnStoredSource\": True }},\n", + " {'$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document' : '$$ROOT' } }\n", + " ]\n", + " results = collection.aggregate(pipeline)\n", + " return results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Perform vector search query" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query = \"Shoes for Seattle sweater weather\"\n", + "results = vector_search(query, 5)\n", + "\n", + "print(\"\\nResults:\\n\")\n", + "for result in results: \n", + " print(f\"Similarity Score: {result['similarityScore']}\") \n", + " print(f\"Title: {result['document']['name']}\") \n", + " print(f\"Price: {result['document']['price']}\") \n", + " print(f\"Material: {result['document']['material']}\") \n", + " print(f\"Image: {img_url}\\n\") " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Generating Ad content with GPT-4 and DALL.E 3\n", + "\n", + "Finally, we put it all together by creating an ad caption and an ad image via the `Completions` API and `DALL.E 3` API, and combine that with the vector search results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from openai import OpenAI\n", + "\n", + "def generate_ad_title(ad_topic):\n", + " system_prompt = '''\n", + " You are an intelligent assistant for generating witty and cativating tagline for online advertisement.\n", + " - The ad campaign taglines that you generate are short and typically under 100 characters.\n", + " '''\n", + "\n", + " user_prompt = f'''Generate a catchy, witty, and short sentence (less than 100 characters) \n", + " for an advertisement for selling shoes for {ad_topic}'''\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": user_prompt},\n", + " ]\n", + "\n", + " response = client.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=messages\n", + " )\n", + " \n", + " return response.choices[0].message.content\n", + "\n", + "def Generate_ad_image(ad_topic):\n", + " daliClient = OpenAI(\n", + " api_key=config['dali_api_key']\n", + " )\n", + "\n", + " image_prompt = f'''\n", + " Generate a photorealistic image of an ad campaign for selling {ad_topic}. \n", + " The image should be clean, with the item being sold in the foreground with an easily identifiable landmark of the city in the background.\n", + " The image should also try to depict the weather of the location for the time of the year mentioned.\n", + " The image should not have any generated text overlay.\n", + " '''\n", + "\n", + " response = daliClient.images.generate(\n", + " model=\"dall-e-3\",\n", + " prompt= image_prompt,\n", + " size=\"1024x1024\",\n", + " quality=\"standard\",\n", + " n=1,\n", + " )\n", + "\n", + " return response.data[0].url\n", + "\n", + "def render_html_page(ad_topic):\n", + "\n", + " # Find the matching shoes from the inventory\n", + " results = vector_search(ad_topic, 4)\n", + "\n", + " ad_header = generate_ad_title(ad_topic)\n", + " ad_image_url = Generate_ad_image(ad_topic)\n", + "\n", + "\n", + " with open('./data/ad-start.html', 'r', encoding='utf-8') as html_file:\n", + " html_content = html_file.read()\n", + "\n", + " html_content += f'''
\n", + "

{ad_header}

\n", + "
''' \n", + "\n", + " html_content += f'''\n", + "
\n", + " \"Base\n", + "
'''\n", + "\n", + " for result in results: \n", + " html_content += f''' \n", + "
\n", + " \"{result['document']['name']}\"\n", + "
\n", + "

{result['document']['name']}

\n", + "

{\"$\"+str(result['document']['price'])}

\n", + "

{result['document']['description']}

\n", + " Buy Now\n", + "
\n", + "
\n", + " '''\n", + "\n", + " html_content += '''\n", + " \n", + " '''\n", + "\n", + " return html_content" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Putting it all together" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import gradio as gr\n", + "\n", + "css = \"\"\"\n", + " button { background-color: purple; color: read; }\n", + " \n", + "\"\"\"\n", + "\n", + "with gr.Blocks(css=css, theme=gr.themes.Default(spacing_size=gr.themes.sizes.spacing_sm, radius_size=\"none\")) as demo:\n", + " subject = gr.Textbox(placeholder=\"subject\", label=\"Ad keywords\")\n", + " btn = gr.Button(\"Generate Ad\")\n", + " output_html = gr.HTML(label=\"Generated Ad HTML\")\n", + "\n", + " btn.click(render_html_page, [subject], output_html)\n", + "\n", + " btn = gr.Button(\"Copy HTML\")\n", + "\n", + "if __name__ == \"__main__\":\n", + " demo.launch() " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.13" + }, + "varInspector": { + "cols": { + "lenName": 16, + "lenType": 16, + "lenVar": 40 + }, + "kernels_config": { + "python": { + "delete_cmd_postfix": "", + "delete_cmd_prefix": "del ", + "library": "var_list.py", + "varRefreshCmd": "print(var_dic_list())" + }, + "r": { + "delete_cmd_postfix": ") ", + "delete_cmd_prefix": "rm(", + "library": "var_list.r", + "varRefreshCmd": "cat(var_dic_list()) " + } + }, + "types_to_exclude": [ + "module", + "function", + "builtin_function_or_method", + "instance", + "_Feature" + ], + "window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}