Skip to content

Commit

Permalink
New GPT DP & NavFT blueprints
Browse files Browse the repository at this point in the history
  • Loading branch information
alexahaushalter committed Aug 9, 2024
1 parent df2e316 commit b242737
Show file tree
Hide file tree
Showing 9 changed files with 90 additions and 49 deletions.
7 changes: 7 additions & 0 deletions use_cases/details/gpt-dp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
![Create free text data with privacy guarantees](https://blueprints.gretel.cloud/use_cases/images/gpt-dp.png "Create free text data with privacy guarantees")

Unlock the potential of your text data while ensuring privacy by applying [differentially private fine-tuning using GPT](https://gretel.ai/blog/generate-differentially-private-synthetic-text-with-gretel-gpt). This method allows you to create a version of your free text data that maintains the integrity of sensitive information while still providing high-quality outputs.

We recommend having a dataset of at least 10,000 samples to ensure reasonable quality. Note that differential privacy requires more epochs, which leads to longer training times compared to running without differential privacy.

Prefer coding? Check out the [SDK notebook](https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/generate_differentially_private_synthetic_text.ipynb) example.
8 changes: 8 additions & 0 deletions use_cases/details/navigator-ft-simple.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
![Generate multi-modal synthetic data with Navigator Fine Tuning](https://blueprints.gretel.cloud/use_cases/images/navigator-ft-hero.png "Generate multi-modal synthetic data with Navigator Fine Tuning")

If you’re new to Gretel, our Navigator Fine-Tuning blueprint is a great place to start. This blueprint automatically selects our comprehensive multi-modal model, a great one-stop shop for most synthetic data generation needs. Just answer a few questions, review the model configuration and hit **Run**.

Navigator Fine-Tuning supports mutliple tabular modalities of data within a single model, such as numeric, categorical, and free text data.

Prefer coding? Check out the [SDK notebook](https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/demo/navigator-fine-tuning-intro-tutorial.ipynb) example.

2 changes: 1 addition & 1 deletion use_cases/details/navigator-ft.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
![Generate synthetic tabular, text and time series data](https://blueprints.gretel.cloud/use_cases/images/navigator-ft-hero.png "Generate synthetic tabular, text and time series data")

We are excited to announce the public preview of **Navigator Fine Tuning**, the latest advancement in our suite of synthetic data solutions. This new feature builds upon the recent general availability of [Gretel Navigator](https://console.gretel.ai/navigator), enabling you to generate data not only from a prompt, but also from fine-tuning the underlying model on your domain-specific real-world datasets to generate the highest quality synthetic data.
**Navigator Fine Tuning** is the latest advancement in our suite of synthetic data solutions. It builds upon the recent general availability of [Gretel Navigator](https://console.gretel.ai/navigator), enabling you to generate data not only from a prompt, but also from fine-tuning the underlying model on your domain-specific real-world datasets to generate the highest quality synthetic data.

One of the standout features of Navigator Fine Tuning is its support for multiple tabular data modalities within a single model. This means you can now generate datasets that maintain correlations across:
- Numeric Data: Continuous or discrete numbers
Expand Down
2 changes: 1 addition & 1 deletion use_cases/details/synthetic.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
![Generate synthetic tabular data](https://blueprints.gretel.cloud/use_cases/images/synthetic-tabular-generation.png "Generate synthetic tabular data")

If you’re new to Gretel, our synthetic data blueprint is a great place to start. This gentle introduction to synthetic data generation automatically selects our popular [ACTGAN model](https://gretel.ai/blog/scale-synthetic-data-to-millions-of-rows-with-actgan) and provides a sample healthcare dataset. Just answer a few questions, review the model configuration and hit **Run**.
The synthetic data blueprint is a great introduction to synthetic data generation using our [ACTGAN model](https://gretel.ai/blog/scale-synthetic-data-to-millions-of-rows-with-actgan) for numeric and categorical data using a sample healthcare dataset. Just answer a few questions, review the model configuration and hit **Run**.

Prefer coding? Check out the [Gretel 101 notebook](https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/sdk_blueprints/Gretel_101_Blueprint.ipynb) example. Synthesize data in just 4 lines of code!

Expand Down
120 changes: 73 additions & 47 deletions use_cases/gretel.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,62 +17,22 @@
}
},
{
"gtmId": "use-case-synthetic",
"title": "Generate synthetic data from complex tabular datasets",
"description": "Handle high-dimensional data with thousands of columns and millions of rows.",
"cardType": "Console",
"icon": "synthetics.png",
"detailsFileName": "synthetic.md",
"modelType": "synthetics",
"modelCategory": "synthetics",
"defaultConfig": "config_templates/gretel/synthetics/tabular-actgan.yml",
"sampleDataset": {
"fileName": "sample-synthetic-healthcare.csv",
"description": "Use this sample electronic health records (EHR) dataset to synthesize an entirely new set of statistically equivalent records.",
"records": 9999,
"fields": 18,
"trainingTime": "6 mins",
"bytes": 830021
},
"button1": {
"label": "Gretel 101 Notebook",
"link": "https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/sdk_blueprints/Gretel_101_Blueprint.ipynb"
},
"button2": {
"label": "Advanced Examples Notebook",
"link": "https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/sdk_blueprints/Gretel_Advanced_Tabular_Blueprint.ipynb"
}
},
{
"gtmId": "use-case-navigator-ft",
"title": "[Public Preview] Generate synthetic tabular, text and time series data with Navigator Fine Tuning ",
"description": "Try out our latest synthetic model supporting tabular, text, JSON and time series data in a single dataset.",
"gtmId": "use-case-navigator-ft-simple",
"title": "Generate multi-modal synthetic data with Navigator Fine Tuning",
"description": "Try out our latest synthetic model to combine numeric data, categorical data, free text data, and more in a single dataset.",
"cardType": "Console",
"tag": "Preview",
"icon": "navigator-ft.png",
"detailsFileName": "navigator-ft.md",
"tag": "New",
"icon": "navigator-ft-simple.png",
"detailsFileName": "navigator-ft-simple.md",
"modelType": "navigator_ft",
"modelCategory": "synthetics",
"defaultConfig": "config_templates/gretel/synthetics/navigator-ft.yml",
"sampleDataset": {
"fileName": "sample-patient-events.csv",
"description": "This medical dataset contains sequences of annotated events (such as hospital admission, diagnosis, treatment, etc.) for 1,712 synthetic patients.",
"records": 7348,
"fields": 17,
"trainingTime": "25 mins",
"bytes": 2386363
},
"button1": {
"label": "SDK Notebook",
"link": "https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/demo/navigator-fine-tuning-intro-tutorial.ipynb"
}
"defaultConfig": "config_templates/gretel/synthetics/navigator-ft.yml"
},
{
"gtmId": "use-case-redact-pii",
"title": "Transform unstructured data into AI-ready formats",
"description": "De-identify, transform, or label text and tabular data for AI.",
"cardType": "Console",
"tag": "New",
"icon": "transform.png",
"modelType": "transform_v2",
"modelCategory": "transform",
Expand All @@ -86,6 +46,22 @@
"bytes": 5647
}
},
{
"gtmId": "use-case-gpt-dp",
"title": "Create free text data with privacy guarantees",
"description": "Leverage differentially private fine-tuning with GPT to generate a provably-private version of your free text data.",
"cardType": "Console",
"tag": "New",
"icon": "GPTwithDP.png",
"detailsFileName": "gpt-dp.md",
"modelType": "gpt_x",
"modelCategory": "synthetics",
"defaultConfig": "config_templates/gretel/synthetics/natural-language-differential-privacy.yml",
"button1": {
"label": "SDK Notebook",
"link": "https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/generate_differentially_private_synthetic_text.ipynb"
}
},
{
"gtmId": "use-case-gretel_tuner",
"title": "Optimize your synthetic data",
Expand All @@ -102,6 +78,33 @@
"link": "https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/demo/gretel-tuner-advanced-tutorial.ipynb"
}
},
{
"gtmId": "use-case-synthetic",
"title": "Generate synthetic data from complex tabular datasets",
"description": "Handle high-dimensional data with thousands of columns and millions of rows.",
"cardType": "Console",
"icon": "synthetics.png",
"detailsFileName": "synthetic.md",
"modelType": "synthetics",
"modelCategory": "synthetics",
"defaultConfig": "config_templates/gretel/synthetics/tabular-actgan.yml",
"sampleDataset": {
"fileName": "sample-synthetic-healthcare.csv",
"description": "Use this sample electronic health records (EHR) dataset to synthesize an entirely new set of statistically equivalent records.",
"records": 9999,
"fields": 18,
"trainingTime": "6 mins",
"bytes": 830021
},
"button1": {
"label": "Gretel 101 Notebook",
"link": "https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/sdk_blueprints/Gretel_101_Blueprint.ipynb"
},
"button2": {
"label": "Advanced Examples Notebook",
"link": "https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/sdk_blueprints/Gretel_Advanced_Tabular_Blueprint.ipynb"
}
},
{
"gtmId": "use-case-tabular-dp",
"title": "Create provably private versions of sensitive data",
Expand Down Expand Up @@ -165,6 +168,29 @@
"bytes": 63000
}
},
{
"gtmId": "use-case-navigator-ft",
"title": "Generate synthetic tabular, text and time series data with Navigator Fine Tuning ",
"description": "Try out our latest synthetic model supporting tabular, text, JSON and time series data in a single dataset.",
"cardType": "Console",
"icon": "navigator-ft.png",
"detailsFileName": "navigator-ft.md",
"modelType": "navigator_ft",
"modelCategory": "synthetics",
"defaultConfig": "config_templates/gretel/synthetics/navigator-ft.yml",
"sampleDataset": {
"fileName": "sample-patient-events.csv",
"description": "This medical dataset contains sequences of annotated events (such as hospital admission, diagnosis, treatment, etc.) for 1,712 synthetic patients.",
"records": 7348,
"fields": 17,
"trainingTime": "25 mins",
"bytes": 2386363
},
"button1": {
"label": "SDK Notebook",
"link": "https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/demo/navigator-fine-tuning-intro-tutorial.ipynb"
}
},
{
"gtmId": "use-case-transform-database",
"title": "Redact PII in a database",
Expand Down
Binary file added use_cases/icons/navigator-ft-simple.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added use_cases/icons/[email protected]
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added use_cases/icons/[email protected]
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added use_cases/images/gpt-dp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit b242737

Please sign in to comment.