Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook update #114

Merged
merged 2 commits into from
Jun 1, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 30 additions & 15 deletions notebooks/relational.ipynb
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -9,6 +10,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -44,13 +46,14 @@
"relational_data = connector.extract()\n",
"\n",
"mt = MultiTable(relational_data)\n",
"mt.train()\n",
"mt.train_synthetics()\n",
"mt.generate()\n",
"\n",
"connector.save(mt.synthetic_output_tables, prefix=\"synthetic_\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -302,7 +305,7 @@
"source": [
"#### Transforms\n",
"\n",
"Train Gretel Transforms models by providing table-specific model configs. You only need to train models for tables you want to transform—you do not need to supply a config for every table."
"Train Gretel Transforms models by providing a transforms model config. By default this config will be applied to all tables. You can limit the tables being transformed via the optional `only` (tables to include) or `ignore` (tables to exclude) arguments."
]
},
{
Expand All @@ -311,14 +314,15 @@
"metadata": {},
"outputs": [],
"source": [
"# Transform some tables\n",
"config = \"https://raw.githubusercontent.com/gretelai/gdpr-helpers/main/src/config/transforms_config.yaml\"\n",
"\n",
"multitable.train_transform_models(\n",
" configs={\n",
" \"users\": \"https://gretel-blueprints-pub.s3.amazonaws.com/rdb/users_policy.yaml\",\n",
" \"events\": \"https://gretel-blueprints-pub.s3.amazonaws.com/rdb/events_policy.yaml\",\n",
" }\n",
")"
"multitable.train_transforms(config)\n",
"\n",
"# Optionally limit which tables are trained for transforms via `only` (included) or `ignore` (excluded).\n",
"# Given our example data, the two calls below will lead to the same tables getting trained, just specified different ways.\n",
"#\n",
"# multitable.train_transforms(config, ignore={\"distribution_center\", \"products\"})\n",
"# multitable.train_transforms(config, only={\"users\", \"events\", \"inventory_items\", \"order_items\"})"
]
},
{
Expand Down Expand Up @@ -373,6 +377,14 @@
"#### Synthetics"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Start by training models for synthetics. By default, a synthetics model will be trained for every table in the `RelationalData`. However, this scope can be reduced to a subset of tables using the optional `only` (tables to include) or `ignore` (tables to exclude) arguments. This can be particularly useful if certain tables contain static reference data that should not be synthesized."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -381,7 +393,13 @@
"source": [
"# Train synthetic models for all tables\n",
"\n",
"multitable.train()"
"multitable.train_synthetics()\n",
"\n",
"# Optionally limit which tables are trained for synthetics via `only` (included) or `ignore` (excluded).\n",
"# Given our example data, the two calls below will lead to the same tables getting trained, just specified different ways.\n",
"#\n",
"# multitable.train_synthetics(ignore={\"distribution_center\", \"products\"})\n",
"# multitable.train_synthetics(only={\"users\", \"events\", \"inventory_items\", \"order_items\"})"
]
},
{
Expand Down Expand Up @@ -410,7 +428,7 @@
"source": [
"Each synthetic data generation run is assigned (or supplied) a unique identifier. Look for a subdirectory with this identifier name in the working directory to find all synthetic outputs, including data and reports. An archive file containing all runs' outputs is also uploaded to the Gretel project as a project artifact, visible in the Data Sources tab in the Console.\n",
"\n",
"When you generate synthetic data, you can optionally change the amount of data to generate via `record_size_ratio`, as well as optionally preserve certain tables' source data via `preserve_tables`."
"When you generate synthetic data, you can optionally change the amount of data to generate via `record_size_ratio`."
]
},
{
Expand All @@ -427,10 +445,7 @@
"# multitable.generate(identifier=\"my-synthetics-run\")\n",
"\n",
"# Generate twice as much synthetic data\n",
"# multitable.generate(record_size_ratio=2.0)\n",
"\n",
"# Treat certain tables as static reference data that should not be synthesized\n",
"# multitable.generate(preserve_tables=[\"distribution_center\"])"
"# multitable.generate(record_size_ratio=2.0)"
]
},
{
Expand Down