Skip to content

Commit

Permalink
Notebook update (#114)
Browse files Browse the repository at this point in the history
  • Loading branch information
mikeknep authored Jun 1, 2023
1 parent a28ad4e commit 19a08ea
Showing 1 changed file with 30 additions and 15 deletions.
45 changes: 30 additions & 15 deletions notebooks/relational.ipynb
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -9,6 +10,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -44,13 +46,14 @@
"relational_data = connector.extract()\n",
"\n",
"mt = MultiTable(relational_data)\n",
"mt.train()\n",
"mt.train_synthetics()\n",
"mt.generate()\n",
"\n",
"connector.save(mt.synthetic_output_tables, prefix=\"synthetic_\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -302,7 +305,7 @@
"source": [
"#### Transforms\n",
"\n",
"Train Gretel Transforms models by providing table-specific model configs. You only need to train models for tables you want to transform—you do not need to supply a config for every table."
"Train Gretel Transforms models by providing a transforms model config. By default this config will be applied to all tables. You can limit the tables being transformed via the optional `only` (tables to include) or `ignore` (tables to exclude) arguments."
]
},
{
Expand All @@ -311,14 +314,15 @@
"metadata": {},
"outputs": [],
"source": [
"# Transform some tables\n",
"config = \"https://raw.githubusercontent.com/gretelai/gdpr-helpers/main/src/config/transforms_config.yaml\"\n",
"\n",
"multitable.train_transform_models(\n",
" configs={\n",
" \"users\": \"https://gretel-blueprints-pub.s3.amazonaws.com/rdb/users_policy.yaml\",\n",
" \"events\": \"https://gretel-blueprints-pub.s3.amazonaws.com/rdb/events_policy.yaml\",\n",
" }\n",
")"
"multitable.train_transforms(config)\n",
"\n",
"# Optionally limit which tables are trained for transforms via `only` (included) or `ignore` (excluded).\n",
"# Given our example data, the two calls below will lead to the same tables getting trained, just specified different ways.\n",
"#\n",
"# multitable.train_transforms(config, ignore={\"distribution_center\", \"products\"})\n",
"# multitable.train_transforms(config, only={\"users\", \"events\", \"inventory_items\", \"order_items\"})"
]
},
{
Expand Down Expand Up @@ -373,6 +377,14 @@
"#### Synthetics"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Start by training models for synthetics. By default, a synthetics model will be trained for every table in the `RelationalData`. However, this scope can be reduced to a subset of tables using the optional `only` (tables to include) or `ignore` (tables to exclude) arguments. This can be particularly useful if certain tables contain static reference data that should not be synthesized."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -381,7 +393,13 @@
"source": [
"# Train synthetic models for all tables\n",
"\n",
"multitable.train()"
"multitable.train_synthetics()\n",
"\n",
"# Optionally limit which tables are trained for synthetics via `only` (included) or `ignore` (excluded).\n",
"# Given our example data, the two calls below will lead to the same tables getting trained, just specified different ways.\n",
"#\n",
"# multitable.train_synthetics(ignore={\"distribution_center\", \"products\"})\n",
"# multitable.train_synthetics(only={\"users\", \"events\", \"inventory_items\", \"order_items\"})"
]
},
{
Expand Down Expand Up @@ -410,7 +428,7 @@
"source": [
"Each synthetic data generation run is assigned (or supplied) a unique identifier. Look for a subdirectory with this identifier name in the working directory to find all synthetic outputs, including data and reports. An archive file containing all runs' outputs is also uploaded to the Gretel project as a project artifact, visible in the Data Sources tab in the Console.\n",
"\n",
"When you generate synthetic data, you can optionally change the amount of data to generate via `record_size_ratio`, as well as optionally preserve certain tables' source data via `preserve_tables`."
"When you generate synthetic data, you can optionally change the amount of data to generate via `record_size_ratio`."
]
},
{
Expand All @@ -427,10 +445,7 @@
"# multitable.generate(identifier=\"my-synthetics-run\")\n",
"\n",
"# Generate twice as much synthetic data\n",
"# multitable.generate(record_size_ratio=2.0)\n",
"\n",
"# Treat certain tables as static reference data that should not be synthesized\n",
"# multitable.generate(preserve_tables=[\"distribution_center\"])"
"# multitable.generate(record_size_ratio=2.0)"
]
},
{
Expand Down

0 comments on commit 19a08ea

Please sign in to comment.