-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
41a7c81
commit 3249de9
Showing
1 changed file
with
1 addition
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"authorship_tag":"ABX9TyMuh9CqcuqP+k1q0cAuIMgJ"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/sdk_blueprints/Gretel_101_Blueprint.ipynb)\n","\n","<br>\n","\n","<center><a href=https://gretel.ai/><img src=\"https://gretel-public-website.s3.us-west-2.amazonaws.com/assets/brand/gretel_brand_wordmark.svg\" alt=\"Gretel\" width=\"350\"/></a></center>\n","\n","<br>\n","\n","## Welcome to the Gretel 101 Blueprint!\n","\n","In this Blueprint, we will use Gretel to train a deep generative model and use it to generate high-quality synthetic (tabular) data. We will accomplish this by submitting training and generation jobs to the [Gretel Cloud](https://gretel.ai/faqs/gretel-cloud) via [Gretel's Python SDK](https://docs.gretel.ai/guides/environment-setup/cli-and-sdk).\n","\n","Behind the scenes, Gretel will spin up workers with the necessary compute resources, set up the model with your desired configuration, and perform the submitted task.\n","\n","## Create your Gretel account\n","\n","To get started, you will need to [sign up for a free Gretel account](https://console.gretel.ai/).\n","\n","<br>\n","\n","#### Ready? Let's go π"],"metadata":{"id":"nwpvdB3Jn5hG"}},{"cell_type":"markdown","source":["## πΎ Install `gretel-client` and its dependencies"],"metadata":{"id":"MPHEAxLufyEo"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"zFeKqpkunEo1"},"outputs":[],"source":["%%capture\n","!pip install gretel-client"]},{"cell_type":"markdown","source":["## π Configure your Gretel session\n","\n","- The `Gretel` object provides a high-level interface for streamlining interactions with Gretel's APIs.\n","\n","- Each `Gretel` instance is bound to a single [Gretel project](https://docs.gretel.ai/guides/gretel-fundamentals/projects).\n","\n","- Running the cell below will prompt you for your Gretel API key, which you can retrieve [here](https://console.gretel.ai/users/me/key).\n","\n","- With `validate=True`, your login credentials will be validated immediately at instantiation."],"metadata":{"id":"DNdDXiI-Xkf1"}},{"cell_type":"code","source":["from gretel_client import Gretel\n","\n","gretel = Gretel(api_key=\"prompt\", validate=True)"],"metadata":{"id":"5qnVwoPZx4j0"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# @title ποΈ Pick a tabular dataset π { display-mode: \"form\" }\n","dataset_path_dict = {\n"," \"adult income in the USA (14000 records, 15 fields)\": \"https://raw.githubusercontent.com/gretelai/gretel-blueprints/main/sample_data/us-adult-income.csv\",\n"," \"hospital length of stay (9999 records, 18 fields)\": \"https://raw.githubusercontent.com/gretelai/gretel-blueprints/main/sample_data/sample-synthetic-healthcare.csv\",\n"," \"customer churn (7032 records, 21 fields)\": \"https://raw.githubusercontent.com/gretelai/gretel-blueprints/main/sample_data/monthly-customer-payments.csv\"\n","}\n","\n","dataset = \"adult income in the USA (14000 records, 15 fields)\" # @param [\"adult income in the USA (14000 records, 15 fields)\", \"hospital length of stay (9999 records, 18 fields)\", \"customer churn (7032 records, 21 fields)\"]\n","dataset = dataset_path_dict[dataset]\n"],"metadata":{"id":"uRbY7vk3tSBg"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["import pandas as pd\n","\n","# explore the data using pandas\n","df = pd.read_csv(dataset)\n","df.head()"],"metadata":{"id":"cW3VKpyPvm6W"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## ποΈββοΈ Train a generative model\n","\n","- The [tabular-actgan](https://github.com/gretelai/gretel-blueprints/blob/main/config_templates/gretel/synthetics/tabular-actgan.yml) base config tells Gretel which model to train and how to configure it.\n","\n","- You can replace `tabular-actgan` with the path to a custom config or select any of the tabular configs [listed here](https://github.com/gretelai/gretel-blueprints/tree/main/config_templates/gretel/synthetics).\n","\n","- The training data is passed in using the `data_source` argument. Its type can be a file path or `DataFrame`.\n","\n","- **Tip:** Click the printed Console URL to monitor your job's progress in the Gretel Console."],"metadata":{"id":"SwROZthrvXil"}},{"cell_type":"code","source":["trained = gretel.submit_train(\"tabular-actgan\", data_source=dataset)"],"metadata":{"id":"i89eGZwIxSCW"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## π§ Evaluate the synthetic data quality\n","\n","- Gretel automatically creates a [synthetic data quality report](https://docs.gretel.ai/reference/evaluate/synthetic-data-quality-report) for each model you train.\n","\n","- The training results object returned by `submit_train` has a `GretelReport` attribute for viewing the quality report.\n"],"metadata":{"id":"eljkfb8jb_hK"}},{"cell_type":"code","source":["# view the quality scores\n","print(trained.report)"],"metadata":{"id":"bNZqhFPOclrV"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# display the full report within this notebook\n","trained.report.display_in_notebook()"],"metadata":{"id":"3QMiP7lKecE5"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# inspect the synthetic data used to create the report\n","df_synth_report = trained.fetch_report_synthetic_data()\n","df_synth_report.head()"],"metadata":{"id":"2dHuQT_cuIno"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## π€ Generate synthetic data\n","\n","- The `model_id` argument can be the ID of any trained model within the current project.\n"],"metadata":{"id":"ZIeY7TczxvDV"}},{"cell_type":"code","source":["generated = gretel.submit_generate(trained.model_id, num_records=1000)"],"metadata":{"id":"J6XZUuR2eguX"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# inspect the generated synthetic data\n","generated.synthetic_data.head()"],"metadata":{"id":"-_do0Kvvunv2"},"execution_count":null,"outputs":[]}]} | ||
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"authorship_tag":"ABX9TyNosAwAWvwVU9i43TeCxQrP"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/sdk_blueprints/Gretel_101_Blueprint.ipynb)\n","\n","<br>\n","\n","<center><a href=https://gretel.ai/><img src=\"https://gretel-public-website.s3.us-west-2.amazonaws.com/assets/brand/gretel_brand_wordmark.svg\" alt=\"Gretel\" width=\"350\"/></a></center>\n","\n","<br>\n","\n","## Welcome to the Gretel 101 Blueprint!\n","\n","In this Blueprint, we will use Gretel to train a deep generative model and use it to generate high-quality synthetic (tabular) data. We will accomplish this by submitting training and generation jobs to the [Gretel Cloud](https://gretel.ai/faqs/gretel-cloud) via [Gretel's Python SDK](https://docs.gretel.ai/guides/environment-setup/cli-and-sdk).\n","\n","Behind the scenes, Gretel will spin up workers with the necessary compute resources, set up the model with your desired configuration, and perform the submitted task.\n","\n","## Create your Gretel account\n","\n","To get started, you will need to [sign up for a free Gretel account](https://console.gretel.ai/).\n","\n","<br>\n","\n","#### Ready? Let's go π"],"metadata":{"id":"nwpvdB3Jn5hG"}},{"cell_type":"markdown","source":["## πΎ Install `gretel-client` and its dependencies"],"metadata":{"id":"MPHEAxLufyEo"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"zFeKqpkunEo1"},"outputs":[],"source":["%%capture\n","!pip install gretel-client"]},{"cell_type":"markdown","source":["## π Configure your Gretel session\n","\n","- The `Gretel` object provides a high-level interface for streamlining interactions with Gretel's APIs.\n","\n","- Each `Gretel` instance is bound to a single [Gretel project](https://docs.gretel.ai/guides/gretel-fundamentals/projects).\n","\n","- Running the cell below will prompt you for your Gretel API key, which you can retrieve [here](https://console.gretel.ai/users/me/key).\n","\n","- With `validate=True`, your login credentials will be validated immediately at instantiation."],"metadata":{"id":"DNdDXiI-Xkf1"}},{"cell_type":"code","source":["from gretel_client import Gretel\n","\n","gretel = Gretel(api_key=\"prompt\", validate=True)"],"metadata":{"id":"5qnVwoPZx4j0"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# @title ποΈ Pick a tabular dataset π { display-mode: \"form\" }\n","dataset_path_dict = {\n"," \"adult income in the USA (14000 records, 15 fields)\": \"https://raw.githubusercontent.com/gretelai/gretel-blueprints/main/sample_data/us-adult-income.csv\",\n"," \"hospital length of stay (9999 records, 18 fields)\": \"https://raw.githubusercontent.com/gretelai/gretel-blueprints/main/sample_data/sample-synthetic-healthcare.csv\",\n"," \"customer churn (7032 records, 21 fields)\": \"https://raw.githubusercontent.com/gretelai/gretel-blueprints/main/sample_data/monthly-customer-payments.csv\"\n","}\n","\n","dataset = \"adult income in the USA (14000 records, 15 fields)\" # @param [\"adult income in the USA (14000 records, 15 fields)\", \"hospital length of stay (9999 records, 18 fields)\", \"customer churn (7032 records, 21 fields)\"]\n","dataset = dataset_path_dict[dataset]\n"],"metadata":{"id":"uRbY7vk3tSBg"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["import pandas as pd\n","\n","# explore the data using pandas\n","df = pd.read_csv(dataset)\n","df.head()"],"metadata":{"id":"cW3VKpyPvm6W"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## ποΈββοΈ Train a generative model\n","\n","- The [tabular-actgan](https://github.com/gretelai/gretel-blueprints/blob/main/config_templates/gretel/synthetics/tabular-actgan.yml) base config tells Gretel which model to train and how to configure it.\n","\n","- You can replace `tabular-actgan` with the path to a custom config file, or you can select any of the tabular configs [listed here](https://github.com/gretelai/gretel-blueprints/tree/main/config_templates/gretel/synthetics).\n","\n","- The training data is passed in using the `data_source` argument. Its type can be a file path or `DataFrame`.\n","\n","- **Tip:** Click the printed Console URL to monitor your job's progress in the Gretel Console."],"metadata":{"id":"SwROZthrvXil"}},{"cell_type":"code","source":["trained = gretel.submit_train(\"tabular-actgan\", data_source=dataset)"],"metadata":{"id":"i89eGZwIxSCW"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## π§ Evaluate the synthetic data quality\n","\n","- Gretel automatically creates a [synthetic data quality report](https://docs.gretel.ai/reference/evaluate/synthetic-data-quality-report) for each model you train.\n","\n","- The training results object returned by `submit_train` has a `GretelReport` attribute for viewing the quality report.\n"],"metadata":{"id":"eljkfb8jb_hK"}},{"cell_type":"code","source":["# view the quality scores\n","print(trained.report)"],"metadata":{"id":"bNZqhFPOclrV"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# display the full report within this notebook\n","trained.report.display_in_notebook()"],"metadata":{"id":"3QMiP7lKecE5"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# inspect the synthetic data used to create the report\n","df_synth_report = trained.fetch_report_synthetic_data()\n","df_synth_report.head()"],"metadata":{"id":"2dHuQT_cuIno"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## π€ Generate synthetic data\n","\n","- The `model_id` argument can be the ID of any trained model within the current project.\n"],"metadata":{"id":"ZIeY7TczxvDV"}},{"cell_type":"code","source":["generated = gretel.submit_generate(trained.model_id, num_records=1000)"],"metadata":{"id":"J6XZUuR2eguX"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# inspect the generated synthetic data\n","generated.synthetic_data.head()"],"metadata":{"id":"-_do0Kvvunv2"},"execution_count":null,"outputs":[]}]} |