gretelai · matthewgrossman · Dec 19, 2024
diff --git a/docs/notebooks/azure/navigator_tabular_azure_maas.ipynb b/docs/notebooks/azure/navigator_tabular_azure_maas.ipynb
@@ -0,0 +1,353 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a target=\"_parent\" href=\"https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/amazon/navigator_tabular_amazon_bedrock.ipynb\">\n",
+    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
+    "</a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# TODO\n",
+    "# Gretel Navigator Tabular on Azure MaaS \n",
+    "\n",
+    "This Notebook will walk you through deploying Gretel Navigator Tabular as a Bedrock Marketplace Model. You can deploy Gretel Navigator as an endpoint in Bedrock and interact with the model using the Gretel SDK.\n",
+    "\n",
+    "This Notebook will walk you through the following steps:\n",
+    "\n",
+    "* Deploy Gretel Navigator Tabular on Amazon Bedrock\n",
+    "* Install and configure the Gretel SDK\n",
+    "* Generate synthetic data with the Gretel SDK and the Bedrock Endpoint\n",
+    "* Edit and augment existing data with the Gretel SDK and the Bedrock Endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# TODO\n",
+    "# Deploy Gretel Navigator\n",
+    "\n",
+    "To get started, visit the [Amazon Bedrock homepage](https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/) in the AWS Console. In this example we'll be using `us-west-2`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "1. Under **Foundation Models**, select **Model Catalog**:\n",
+    "\n",
+    "<img src=\"https://gretel-blueprints-pub.s3.us-west-2.amazonaws.com/navigator_bedrock/1_model-catalog.png\" alt=\"Model Catalog\" width=\"70%\">\n",
+    "\n",
+    "2. Under **Providers** on the left side, select **Gretel**:\n",
+    "\n",
+    "<img src=\"https://gretel-blueprints-pub.s3.us-west-2.amazonaws.com/navigator_bedrock/2_providers.png\" alt=\"Provider\" width=\"70%\">\n",
+    "\n",
+    "4. Click on **View subscription options**:\n",
+    "\n",
+    "<img src=\"https://gretel-blueprints-pub.s3.us-west-2.amazonaws.com/navigator_bedrock/3_subscription-options.png\" alt=\"Subscription Options\" width=\"70%\">\n",
+    "\n",
+    "\n",
+    "6. Click on **Subscribe**:\n",
+    "\n",
+    "<img src=\"https://gretel-blueprints-pub.s3.us-west-2.amazonaws.com/navigator_bedrock/4_subscribe.png\" alt=\"Subscribe\" width=\"60%\">\n",
+    "\n",
+    "\n",
+    "8. Wait for the subscription to complete:\n",
+    "\n",
+    "<img src=\"https://gretel-blueprints-pub.s3.us-west-2.amazonaws.com/navigator_bedrock/5_subscription_complete.png\" alt=\"Subscription Complete\" width=\"70%\">\n",
+    "\n",
+    "\n",
+    "10. Once the subscription is complete, click **Deploy**:\n",
+    "\n",
+    "<img src=\"https://gretel-blueprints-pub.s3.us-west-2.amazonaws.com/navigator_bedrock/6_deploy.png\" alt=\"Deploy\" width=\"70%\">\n",
+    "\n",
+    "\n",
+    "12. You should reach a configuration screen like below. For this example, we will use the defaults. Update the fields for your use case and modify the **Advanced Settings** as required.\n",
+    "\n",
+    "\n",
+    "When you are done with the configuration, click the **Deploy** button on the bottom right.\n",
+    "\n",
+    "<img src=\"https://gretel-blueprints-pub.s3.us-west-2.amazonaws.com/navigator_bedrock/7_config_deploy.png\" alt=\"Configure and Deploy\" width=\"70%\">\n",
+    "\n",
+    "\n",
+    "8. Remain on the page, and you should eventually see something like this:\n",
+    "\n",
+    "<img src=\"https://gretel-blueprints-pub.s3.us-west-2.amazonaws.com/navigator_bedrock/8_in_progress.png\" alt=\"Deployment Progress\" width=\"70%\">\n",
+    "\n",
+    "\n",
+    "Wait for the model to deploy and the **Endpoint status** to change from **Creating** to **In Service**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Setup"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "From the **Marketplace deployments** page (see above). Retrieve the **Endpoint Name (ARN)** and set the variable below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Install the OpenAI SDK (if you do not already have it)\n",
+    "\n",
+    "!pip install -U -qq openai gretel-client"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Import required libraries\n",
+    "from openai import OpenAI\n",
+    "from getpass import getpass"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      "Azure endpoint:  https://gretelserverlessendpointmaas.eastus2.models.ai.azure.com\n",
+      "Azure API key:  ········\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Set region and get credentials securely\n",
+    "AZURE_ENDPOINT = input(\"Azure endpoint: \")\n",
+    "AZURE_API_KEY = getpass(\"Azure API key: \")\n",
+    "\n",
+    "# the `AzureOpenAI` client mangles the URL, so we stick with the default `OpenAI` client\n",
+    "oai_client = OpenAI(base_url=AZURE_ENDPOINT, api_key=AZURE_API_KEY)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Create an Azure Adapter using the Gretel SDK\n",
+    "\n",
+    "from gretel_client import Gretel\n",
+    "\n",
+    "azure_open_ai = Gretel.create_navigator_azure_oai_adapter(oai_client)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Generate and Augment Datasets using Gretel Navigator\n",
+    "\n",
+    "Alright, we're now ready to start creating data! We'll first generate some data using a single prompt, and then we'll add a couple of new columns. Try out some of your own prompts to see how it works."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Generating data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:10, 0.96 records/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "  first_name   last_name               email  gender              city country\n",
+      "0    Antoine        Roux        [email protected]    Male         Marseille  France\n",
+      "1        Léa      Durand      [email protected]  Female        Strasbourg  France\n",
+      "2    Gaspard    Fournier    [email protected]    Male          Bordeaux  France\n",
+      "3   Juliette      Pierre      [email protected]  Female            Rennes  France\n",
+      "4    Étienne    Marchand    [email protected]    Male          Toulouse  France\n",
+      "5    Aurélie      Benoit      [email protected]  Female             Lille  France\n",
+      "6     Cédric      Renaud      [email protected]    Male              Nice  France\n",
+      "7  Charlotte     Garnier     [email protected]  Female  Clermont-Ferrand  France\n",
+      "8    Olivier       Dumas       [email protected]    Male            Nantes  France\n",
+      "9      Adèle  Carpentier  [email protected]  Female       Montpellier  France\n",
+      "*******\n",
+      "ResponseMetadata(completion_id='b18cae2d-0a46-4b42-b2d8-f347ac1c6225', usage={'completion_tokens': 550, 'prompt_tokens': 99, 'total_tokens': 649, 'completion_tokens_details': None, 'prompt_tokens_details': None, 'input_bytes': 398, 'output_bytes': 2201, 'total_bytes': 2599, 'billed_bytes': 2600, 'billed_credits': 0.026}, model_id='gretelai/auto')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# First we'll generate some data from only a prompt. We provide a prompt and some existing sample data to guide the generation process.\n",
+    "\n",
+    "import pandas as pd\n",
+    "\n",
+    "PROMPT = \"\"\"Generate a mock dataset for users from the Foo company based in France.\n",
+    "Each user should have the following columns:\n",
+    "* first_name: traditional French first names.\n",
+    "* last_name: traditional French surnames.\n",
+    "* email: formatted as the first letter of their first name followed by their last name @foo.io (e.g., [email protected])\n",
+    "* gender: Male/Female\n",
+    "* city: a city in France\n",
+    "* country: always 'France'.\n",
+    "\"\"\"\n",
+    "\n",
+    "table_headers = [\"first_name\", \"last_name\", \"email\", \"gender\", \"city\", \"country\"]\n",
+    "table_data = [\n",
+    "    {\n",
+    "        \"first_name\": \"Lea\",\n",
+    "        \"last_name\": \"Martin\",\n",
+    "        \"email\": \"[email protected]\",\n",
+    "        \"gender\": \"Female\",\n",
+    "        \"city\": \"Lyon\",\n",
+    "        \"country\": \"France\",\n",
+    "    }\n",
+    "]\n",
+    "\n",
+    "SAMPLE_DATA = pd.DataFrame(table_data, columns=table_headers)\n",
+    "\n",
+    "metadata, synthetic_df = azure_open_ai.generate(\n",
+    "    \"gretelai/auto\",\n",
+    "    PROMPT,\n",
+    "    num_records=10,\n",
+    "    sample_data=SAMPLE_DATA,\n",
+    ")\n",
+    "\n",
+    "print(synthetic_df)\n",
+    "print(\"*******\")\n",
+    "print(metadata)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Editing data: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04, 2.05 records/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "  first_name   last_name               email  gender              city  \\\n",
+      "0    Antoine        Roux        [email protected]    Male         Marseille   \n",
+      "1        Léa      Durand      [email protected]  Female        Strasbourg   \n",
+      "2    Gaspard    Fournier    [email protected]    Male          Bordeaux   \n",
+      "3   Juliette      Pierre      [email protected]  Female            Rennes   \n",
+      "4    Étienne    Marchand    [email protected]    Male          Toulouse   \n",
+      "5    Aurélie      Benoit      [email protected]  Female             Lille   \n",
+      "6     Cédric      Renaud      [email protected]    Male              Nice   \n",
+      "7  Charlotte     Garnier     [email protected]  Female  Clermont-Ferrand   \n",
+      "8    Olivier       Dumas       [email protected]    Male            Nantes   \n",
+      "9      Adèle  Carpentier  [email protected]  Female       Montpellier   \n",
+      "\n",
+      "  country  occupation                 education level  \n",
+      "0  France        Chef                   Culinary Arts  \n",
+      "1  France      Lawyer             Juris Doctor (J.D.)  \n",
+      "2  France    Engineer       Bachelor's in Engineering  \n",
+      "3  France     Teacher           Master's in Education  \n",
+      "4  France      Doctor           Medical Degree (M.D.)  \n",
+      "5  France      Artist         Bachelor's in Fine Arts  \n",
+      "6  France  Programmer  Bachelor's in Computer Science  \n",
+      "7  France       Nurse           Bachelor's in Nursing  \n",
+      "8  France   Scientist                Ph.D. in Science  \n",
+      "9  France  Journalist        Bachelor's in Journalism  \n",
+      "*******\n",
+      "ResponseMetadata(completion_id='989307a2-a741-4b52-9967-26b3b5c7e229', usage={'completion_tokens': 791, 'prompt_tokens': 33, 'total_tokens': 824, 'completion_tokens_details': None, 'prompt_tokens_details': None, 'input_bytes': 134, 'output_bytes': 3165, 'total_bytes': 3299, 'billed_bytes': 3300, 'billed_credits': 0.033}, model_id='gretelai/auto')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Finally, we'll demonstrate Navigator's edit mode, which can augment existing datasets. In this example we'll take our previously\n",
+    "# generated Synthetic DF and ask Navigator to augment it with new columns.\n",
+    "\n",
+    "EDIT_PROMPT = \"\"\"Edit the table and add the following columns:\n",
+    "* occupation: a random occupation\n",
+    "* education level: make it relevant to the occupation\n",
+    "\"\"\"\n",
+    "\n",
+    "metadata, augmented_df = azure_open_ai.edit(\n",
+    "    \"gretelai/auto\",\n",
+    "    EDIT_PROMPT,\n",
+    "    seed_data=synthetic_df\n",
+    ")\n",
+    "\n",
+    "print(augmented_df)\n",
+    "print(\"*******\")\n",
+    "print(metadata)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "monogretel_venv",
+   "language": "python",
+   "name": "monogretel_venv"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.17"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}