Notebook update (#114)

gretelai · Jun 1, 2023 · 19a08ea · 19a08ea
1 parent a28ad4e
commit 19a08ea
Showing 1 changed file with 30 additions and 15 deletions.
diff --git a/notebooks/relational.ipynb b/notebooks/relational.ipynb
@@ -1,6 +1,7 @@
 {
  "cells": [
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -9,6 +10,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -44,13 +46,14 @@
     "relational_data = connector.extract()\n",
     "\n",
     "mt = MultiTable(relational_data)\n",
-    "mt.train()\n",
+    "mt.train_synthetics()\n",
     "mt.generate()\n",
     "\n",
     "connector.save(mt.synthetic_output_tables, prefix=\"synthetic_\")"
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -302,7 +305,7 @@
    "source": [
     "#### Transforms\n",
     "\n",
-    "Train Gretel Transforms models by providing table-specific model configs. You only need to train models for tables you want to transform—you do not need to supply a config for every table."
+    "Train Gretel Transforms models by providing a transforms model config. By default this config will be applied to all tables. You can limit the tables being transformed via the optional `only` (tables to include) or `ignore` (tables to exclude) arguments."
    ]
   },
   {
@@ -311,14 +314,15 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Transform some tables\n",
+    "config = \"https://raw.githubusercontent.com/gretelai/gdpr-helpers/main/src/config/transforms_config.yaml\"\n",
     "\n",
-    "multitable.train_transform_models(\n",
-    "    configs={\n",
-    "        \"users\": \"https://gretel-blueprints-pub.s3.amazonaws.com/rdb/users_policy.yaml\",\n",
-    "        \"events\": \"https://gretel-blueprints-pub.s3.amazonaws.com/rdb/events_policy.yaml\",\n",
-    "    }\n",
-    ")"
+    "multitable.train_transforms(config)\n",
+    "\n",
+    "# Optionally limit which tables are trained for transforms via `only` (included) or `ignore` (excluded).\n",
+    "# Given our example data, the two calls below will lead to the same tables getting trained, just specified different ways.\n",
+    "#\n",
+    "# multitable.train_transforms(config, ignore={\"distribution_center\", \"products\"})\n",
+    "# multitable.train_transforms(config, only={\"users\", \"events\", \"inventory_items\", \"order_items\"})"
    ]
   },
   {
@@ -373,6 +377,14 @@
     "#### Synthetics"
    ]
   },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Start by training models for synthetics. By default, a synthetics model will be trained for every table in the `RelationalData`. However, this scope can be reduced to a subset of tables using the optional `only` (tables to include) or `ignore` (tables to exclude) arguments. This can be particularly useful if certain tables contain static reference data that should not be synthesized."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -381,7 +393,13 @@
    "source": [
     "# Train synthetic models for all tables\n",
     "\n",
-    "multitable.train()"
+    "multitable.train_synthetics()\n",
+    "\n",
+    "# Optionally limit which tables are trained for synthetics via `only` (included) or `ignore` (excluded).\n",
+    "# Given our example data, the two calls below will lead to the same tables getting trained, just specified different ways.\n",
+    "#\n",
+    "# multitable.train_synthetics(ignore={\"distribution_center\", \"products\"})\n",
+    "# multitable.train_synthetics(only={\"users\", \"events\", \"inventory_items\", \"order_items\"})"
    ]
   },
   {
@@ -410,7 +428,7 @@
    "source": [
     "Each synthetic data generation run is assigned (or supplied) a unique identifier. Look for a subdirectory with this identifier name in the working directory to find all synthetic outputs, including data and reports. An archive file containing all runs' outputs is also uploaded to the Gretel project as a project artifact, visible in the Data Sources tab in the Console.\n",
     "\n",
-    "When you generate synthetic data, you can optionally change the amount of data to generate via `record_size_ratio`, as well as optionally preserve certain tables' source data via `preserve_tables`."
+    "When you generate synthetic data, you can optionally change the amount of data to generate via `record_size_ratio`."
    ]
   },
   {
@@ -427,10 +445,7 @@
     "# multitable.generate(identifier=\"my-synthetics-run\")\n",
     "\n",
     "# Generate twice as much synthetic data\n",
-    "# multitable.generate(record_size_ratio=2.0)\n",
-    "\n",
-    "# Treat certain tables as static reference data that should not be synthesized\n",
-    "# multitable.generate(preserve_tables=[\"distribution_center\"])"
+    "# multitable.generate(record_size_ratio=2.0)"
    ]
   },
   {