Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial commit of docs, including structure and autodoc of specific m… #23

Merged
merged 4 commits into from
Oct 12, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ If you want to quickly get started synthesizing data with **Gretel.ai**, simply

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gretelai/trainer/blob/main/notebooks/trainer-examples.ipynb)

## Join our Slack Workspace
## Join they Synthetic Data Community Discord
MasonEgger marked this conversation as resolved.
Show resolved Hide resolved

If you want to be part of the Gretel synthetic data community to receive announcements of the latest releases,
If you want to be part of the Synthetic Data Community to receive announcements of the latest releases,
ask questions, suggest new features or participate in the development meetings, please join
our Slack Workspace!
the Synthetic Data Community Server!

[![Slack](https://img.shields.io/badge/Slack%20Workspace-Join%20now!-36C5F0?logo=slack)](https://gretel.ai/slackinvite)
[![Discord](https://img.shields.io/discord/1007817822614847500?label=Discord&logo=Discord)](https://gretel.ai/discord)

# Install

Expand All @@ -40,13 +40,13 @@ pip install -U gretel-trainer

# Quickstart

### 1. Add your [Gretel API](https://console.gretel.cloud) key via the Gretel CLI.
## 1. Add your [Gretel API](https://console.gretel.cloud) key via the Gretel CLI.
Use the Gretel client to store your API key to disk. This step is optional, the trainer will prompt for an API key in the next step.
```bash
gretel configure
```

### 2. Train or fine-tune a model using the Gretel API
## 2. Train or fine-tune a model using the Gretel API

```python3
from gretel_trainer import trainer
Expand All @@ -57,7 +57,7 @@ model = trainer.Trainer()
model.train(dataset)
```

### 3. Generate synthetic data!
## 3. Generate synthetic data!
```python3
df = model.generate()
```
Expand Down
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
86 changes: 86 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
import os
import sys

import sphinx

sys.path.insert(0, os.path.abspath("../src"))


project = "Gretel Trainer"
copyright = "2022, Gretel Team"
author = "Gretel.ai"
release = "0.4.0"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.coverage",
"sphinx.ext.napoleon",
"m2r",
"sphinx_rtd_theme",
]

source_suffix = [".rst", ".md"]

templates_path = ["_templates"]
exclude_patterns = [
"_build",
"Thumbs.db",
".DS_Store",
]


# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.

html_theme = "sphinx_rtd_theme"
html_logo = "img/gretel_logo_white.png"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
html_css_files = ["styles.css"]

html_theme_options = {
"logo_only": True,
"display_version": True,
"style_nav_header_background": "#0c0c0d",
}


def monkeypatch(cls):
"""decorator to monkey-patch methods"""

def decorator(f):
method = f.__name__
old_method = getattr(cls, method)
setattr(
cls,
method,
lambda self, *args, **kwargs: f(old_method, self, *args, **kwargs),
)

return decorator


# workaround until https://github.com/miyakogi/m2r/pull/55 is merged
@monkeypatch(sphinx.registry.SphinxComponentRegistry)
def add_source_parser(_old_add_source_parser, self, *args, **kwargs):
# signature is (parser: Type[Parser], **kwargs), but m2r expects
# the removed (str, parser: Type[Parser], **kwargs).
if isinstance(args[0], str):
args = args[1:]
return _old_add_source_parser(self, *args, **kwargs)
Binary file added docs/img/gretel-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/gretel_logo_white.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
105 changes: 105 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
.. Gretel Trainer documentation master file, created by
sphinx-quickstart on Tue Oct 11 09:08:14 2022.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Gretel Trainer
==============

This module is designed to provide a simple interface to help users successfully train synthetic models on complex datasets with high row and column counts, and offers features such as Cloud SaaS based training and multi-GPU based parallelization. Get started for free with an API key from `Gretel.ai <https://console.gretel.cloud>`_.

Current functionality and features:
-----------------------------------

* Synthetic data generators for text, tabular, and time-series data with the following features:
* Balance datasets or boost a minority class using Conditional Data Generation.
* Automated data validation.
* Synthetic data quality reports.
* Privacy filters and optional differential privacy support.
* Multiple `model types supported <https://docs.gretel.ai/synthetics/models>`_\:
* `Gretel-LSTM` model type supports text, tabular, time-series, and conditional data generation.
* `Gretel-CTGAN` model type supports tabular and conditional data generation.
* `Gretel-GPT` natural language synthesis based on an open-source implementation of GPT-3 (coming soon).
* `Gretel-DGAN` multi-variate time series based on DoppelGANger (coming soon).

Train Synthetic Data in as Little as Three Lines of Code!
---------------------------------------------------------

#. Install the Gretel CLI and Gretel Trainer either on your system or in your Notebook
MasonEgger marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: bash

# Command line installation
pip install -U gretel-client gretel-trainer

# Notebook installation
!pip install -Uqq gretel-client gretel-trainer

#. Add your `Gretel API <https://console.gretel.cloud>`_ key via the Gretel CLI.

Use the Gretel client to store your API key to disk. This step is optional, the trainer will prompt for an API key in the next step.

.. code-block:: bash

gretel configure

#. Train or fine-tune a model using the Gretel API
MasonEgger marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: python3

from gretel_trainer import trainer

dataset = "https://gretel-public-website.s3-us-west-2.amazonaws.com/datasets/USAdultIncome5k.csv"

model = trainer.Trainer()
model.train(dataset)

#. Generate synthetic data!

.. code-block:: python3

df = model.generate()

Try it out now!
---------------

If you want to quickly get started synthesizing data with **Gretel.ai**, simply click the button below and follow the examples. See additional Python3 and Jupyter Notebook examples in the `./notebooks` folder.

.. image:: https://colab.research.google.com/assets/colab-badge.svg
:target: https://colab.research.google.com/github/gretelai/trainer/blob/main/notebooks/trainer-examples.ipynb
:alt: Open in Colab

Join the Synthetic Data Community Discord
-----------------------------------------

If you want to be part of the Synthetic Data Community to receive announcements of the latest releases,
ask questions, suggest new features or participate in the development meetings, please join
MasonEgger marked this conversation as resolved.
Show resolved Hide resolved
the Synthetic Data Community Server!

.. image:: https://img.shields.io/discord/1007817822614847500?label=Discord&logo=Discord
:target: https://gretel.ai/discord
:alt: Discord

.. toctree::
:maxdepth: 2
:caption: Contents:


Modules
=======

.. toctree::
:maxdepth: 2

quickstart.rst
trainer.rst
models.rst



Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
5 changes: 5 additions & 0 deletions docs/models.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Models
======

.. automodule:: gretel_trainer.models
:members:
93 changes: 93 additions & 0 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
Quickstart
==========

Initial Setup
-------------

#. Install the Gretel CLI and Gretel Trainer either on your system or in your Notebook
MasonEgger marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: bash

# Command line installation
pip install -U gretel-client gretel-trainer

# Notebook installation
!pip install -Uqq gretel-client gretel-trainer

#. Add your `Gretel API <https://console.gretel.cloud>`_ key via the Gretel CLI.

Use the Gretel client to store your API key to disk. This step is optional, the trainer will prompt for an API key in the next step.

.. code-block:: bash

gretel configure

Train Synthetic Data
--------------------
.. image:: https://colab.research.google.com/assets/colab-badge.svg
:target: https://colab.research.google.com/github/gretelai/trainer/blob/main/notebooks/trainer-examples.ipynb
:alt: Open in Colab

#. Train or fine-tune a model using the Gretel API
MasonEgger marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: python3

from gretel_trainer import trainer

dataset = "https://gretel-public-website.s3-us-west-2.amazonaws.com/datasets/USAdultIncome5k.csv"

model = trainer.Trainer()
model.train(dataset)

#. Generate synthetic data!

.. code-block:: python3

df = model.generate()

Conditional Data Generation
---------------------------
.. image:: https://colab.research.google.com/assets/colab-badge.svg
:target: https://colab.research.google.com/github/gretelai/trainer/blob/main/notebooks/simple-conditional-generation.ipynb
:alt: Open in Colab

#. Load and preview the dataset, set seed fields
MasonEgger marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: python3

# Load and preview the patient dataset
import pandas as pd
from gretel_trainer import trainer

DATASET_PATH = 'https://gretel-public-website.s3.amazonaws.com/datasets/mitre-synthea-health.csv'
SEED_FIELDS = ["RACE", "ETHNICITY", "GENDER"]

print("\nPreviewing real world dataset\n")
pd.read_csv(DATASET_PATH)

#. Train the model
MasonEgger marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: python3

# Train model
model = trainer.Trainer()
model.train(DATASET_PATH, seed_fields=SEED_FIELDS)

#. Conditionally generate data
MasonEgger marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: python3

# Conditionally generate data
seed_df = pd.DataFrame(data=[
["black", "african", "F"],
["black", "african", "F"],
["black", "african", "F"],
["black", "african", "F"],
["asian", "chinese", "F"],
["asian", "chinese", "F"],
["asian", "chinese", "F"],
["asian", "chinese", "F"],
["asian", "chinese", "F"]
], columns=SEED_FIELDS)

model.generate(seed_df=seed_df)
Loading