Releases · gretelai/trainer

30 Apr 19:57

mikeknep

v0.11.4

c4181f9

v0.11.4 Latest

Latest

Benchmark

⚙️ n_jobs param added to Benchmark Config
🐛 Avoid race condition around job submission and snapshots

Full Changelog: v0.11.3...v0.11.4

Assets 2

19 Mar 18:59

mikeknep

v0.11.3

e110d64

v0.11.3

⚠️ Relational Trainer is deprecated

The gretel_trainer.relational module is deprecated and will be removed in a future release. Relational workloads can be executed using Gretel Workflows. Please see our docs on Workflows generally, and in particular the gretel_tabular action and our various first-class Connectors.

Other final Relational updates

Bypass JOIN tables from independent training
Relational Transform v2 jobs are more efficient
Pass through schema for extraction
Improved relational job queuing/buffering strategy

Benchmark

Benchmark now automatically writes snapshots of in-progress session results to the working directory to help with interrupted notebook sessions

Client version

Minimum required version of the gretel-client has been bumped to 0.17.7

Full Changelog: v0.11.2...v0.11.3

Assets 2

09 Jan 23:48

tylersbray

v0.11.2

3cddc7c

v0.11.2

Full Changelog: v0.11.1...v0.11.2

Assets 2

20 Oct 18:49

mikeknep

v0.11.1

bc02f80

0.11.1

Transform v2

Allow Transform v2 configs in Relational Transform

Bug fixes 🐛

Improve foreign key synthesis logic to handle chained foreign keys
Improve report text legibility in dark mode

Full Changelog: v0.11.0...v0.11.1

Assets 2

25 Sep 18:27

mikeknep

v0.11.0

60a7929

0.11.0

Fully removes deprecated methods and arguments ⚠️ 👋

MultiTable gretel_model argument (replaced by config argument on train_synthetics)
MultiTable train method (replaced by train_synthetics)
MultiTable train_transforms_models method (replaced by train_transforms)
RelationalData [add|remove]_foreign_key methods (replaced by [add|remove]_foreign_key_constraint)

Adds support for table-specific transforms configs 🚀

You can now pass an optional table_specific_configs: dict[str, GretelModelConfig] argument to train_transforms. This works identically to the same existing argument on train_synthetics.

Misc improvements ⚙️

Changes default synthetics model from Amplify to ACTGAN
Unifies how individual and cross-table SQS evaluation is performed regardless of synthetic strategy
Adds dependency ordering to Connector save to avoid violating foreign key constraints

Full Changelog: v0.10.1...v0.11.0

Assets 2

11 Aug 19:21

mikeknep

v0.10.1

25aeaa3

0.10.1

Improvement ⚙️

Bump minimal required version of Gretel python client to take advantage of improved artifact handling methods in hybrid environments. (Applies to Trainer, Benchmark, and Relational.)

Bug fix 🐛

Fix a bug in Relational's JSON handling of nested lists of objects.

All PRs

Small language tweaks to benchmark notebook by @mikeknep in #152
Use get_artifact_handle instead of smart_open directly by @mckornfield in #153
Handle nested lists of nested objects by @mikeknep in #154

Full Changelog: v0.10.0...v0.10.1

Contributors

mikeknep and mckornfield

Assets 2

02 Aug 18:57

mikeknep

v0.10.0

11ab6c2

0.10.0

Benchmark v2 🎉

Benchmark has received several internal improvements. While general usage mostly stays the same, there are a few user-facing changes from previous versions:

Breaking changes

Datatype.TABULAR_NUMERIC and Datatype.TABULAR_MIXED have been replaced by a single enum variant, Datatype.TABULAR.
If you're passing a list of multiple sources to make_dataset an exception will be raised.

Deprecations

make_dataset is being replaced by create_dataset
The freestanding functions for Gretel datasets (get_gretel_dataset, list_gretel_datasets, list_gretel_dataset_tags) are being replaced by methods on a new object:
```
repo = GretelDatasetRepo()
repo.get_dataset(...)
repo.list_datasets(...)
repo.list_gretel_dataset_tags(...)
```

Trainer column partitioning removed 👋

Trainer no longer partitions datasets by column. The max_header_clusters argument to the Gretel model classes in gretel_trainer.models is deprecated, and will be removed in a future release.

Smaller notes 🧹

A bug when downloading record handler data in Relational Trainer in hybrid deployments has been fixed
Relational Trainer uses Pandas features that were added in 1.5, so the dependency version has been corrected.
Trainer no longer depends on gretel-synthetics

All PRs

Clean up some imports by @mikeknep in #141
Drop relational drawing spike by @mikeknep in #143
Dependency tweaks by @mikeknep in #145
Benchmark v2 by @mikeknep in #146
Misc by @mikeknep in #148
Remove column partitioning by @mikeknep in #144
Move benchmark log setup out of init by @mikeknep in #149
Export helper as fixture instead of importing from test package by @mikeknep in #147
Cleanup by @mikeknep in #150
Wrap record handler data download in smart_open by @mikeknep in #151

Full Changelog: v0.9.1...v0.10.0

Contributors

mikeknep

Assets 2

24 Jul 18:01

mikeknep

v0.9.1

795ee98

0.9.1

Internal improvements 🧹

Source data is now stored on disk in CSV format rather than in memory as Pandas DataFrames, resulting in a reduced overall memory footprint.

Synthetic composite keys now more accurately reflect characteristics of the source data.

JSON handling has been refactored and an edge case where too-long and/or deeply-nested JSON table names caused failures in Gretel Cloud has been fixed.

All PRs

Core refactors by @mikeknep in #126
Source on disk by @mikeknep in #130
fix: unit test errors on new mac m2 by @benmccown in #134
Chunked independent synthetics preprocessing by @mikeknep in #133
fix: nested json table string length by @benmccown in #135
Expose ExtractorConfig from root relational module by @mikeknep in #137
Fix RTD by @matthewgrossman in #138
New impl for synthesizing composite keys by @mikeknep in #136
Re-add source_ prefix to archived source files by @mikeknep in #140

New Contributors

@benmccown made their first contribution in #134
@matthewgrossman made their first contribution in #138

Full Changelog: v0.9.0...v0.9.1

Contributors

mikeknep, matthewgrossman, and benmccown

Assets 2

20 Jun 17:16

mikeknep

v0.9.0

156d929

0.9.0

Deprecation warning ⚠️

The gretel_model parameter on the Relational Trainer MultiTable class is deprecated in favor of passing a specific model config to the train_synthetics method. See the section on custom synthetics configs below for more detail.

New features 🚀

Custom synthetics configs in Relational Trainer

Users can now provide custom synthetic model configs instead of being limited to a handful of synthetic blueprint configs. Additionally, users can provide different model configs for different tables. In both cases Relational Synthetics accepts blueprint strings, dictionary configs, or paths to model config yaml files as config inputs.

Example:

multi_table.train_synthetics(
    config="./my_actgan_config.yaml",
    table_specific_configs={
        "users": "synthetics/tabular-lstm",
        "events": amplify_config_dict,
    }
)

The config argument defines the default config to use for all tables. This argument is currently optional for backwards compatibility, but should be provided going forward and will be required when the gretel_model parameter is fully removed (see deprecation warnings above).

The optional table_specific_configs argument can be used to set different configs for individual tables.

Encoding of keys for transforms is now optional

Users can opt into or out of key encoding in Relational Transforms. Encoding provides increased security benefits at the cost of referential integrity to the source data and other tables not in scope. The default is to not encode keys.

multi_table.run_transforms(encode_keys=True)

Source column order preserved

The output tables from Relational Transforms and Relational Synthetics present columns in the same order as source tables.

Extraction subsetting

Relational connectors can now perform smart subsetting of source data instead of extracting entire tables.

Performance improvements

Improved performance creating archive files and pre-processing training data.

All PRs

Write and drop by @mikeknep in #117
Table extraction module + subsetting by @johntmyers in #118
Bettertar by @mikeknep in #119
Jm/docs updates by @johntmyers in #121
Custom synthetics configs by @mikeknep in #120
Column order by @mikeknep in #123
table extractor dask fixes by @johntmyers in #124
Deprecation warnings + bugfix by @mikeknep in #122
bq support plus better random and contiguous fallback by @johntmyers in #125
Do not encode keys for transforms by default by @mckornfield in #127

New Contributors

@mckornfield made their first contribution in #127

Full Changelog: v0.8.2...v0.8.3

Contributors

mikeknep, johntmyers, and mckornfield

Assets 2

01 Jun 19:26

mikeknep

v0.8.2

19a08ea

0.8.2

Deprecation warning ⚠️

Two methods have been deprecated and will be removed in a future release.

train_transforms_models is deprecated in favor of train_transforms
train is deprecated in favor of train_synthetics

New features 🚀

JSON support

Gretel Relational can now handle columns with nested JSON. No changes to existing code are required. Depending on shape, nested JSON may lead to additional models being trained under the hood.

Single transforms config for all tables

The new train_transforms method (replacing the now-deprecated train_transforms_models) accepts a single Transforms model config and applies it to all tables. The set of tables being transformed can be controlled via the optional only (inclusive) and ignore (exclusive) parameters.

Tables can be excluded from synthetics

The new train_synthetics method (replacing the now-deprecated train) supports omitting tables from synthetics training. This is useful if you have a table with static reference data that should never be synthesized. As above, the scope of tables is controlled by the optional only and ignore parameters.

Optional schema parameter for Connector

The Connector#extract method takes an optional schema parameter that gets forwarded to SQLAlchemy and Pandas. Note: this parameter is dialect-specific and not supported by all databases.

All PRs

Support for JSON by @mikeknep in #107
Single transforms config by @mikeknep in #109
Kill off all the old style types in Relational code by @mikeknep in #110
Reduce unnecessary JSON parsing by @mikeknep in #112
Preserve tables before training by @mikeknep in #111
Fix bug related to empty normalized tables by @mikeknep in #113
Add optional schema param to connectors by @mikeknep in #116
Adjust JSON log locations by @mikeknep in #115
Notebook update by @mikeknep in #114

Full Changelog: v0.8.1...v0.8.2

Contributors

mikeknep

Assets 2

Releases: gretelai/trainer

v0.11.4

Benchmark

v0.11.3

⚠️ Relational Trainer is deprecated

Other final Relational updates

Benchmark

Client version

v0.11.2

0.11.1

Transform v2

Bug fixes 🐛

0.11.0

Fully removes deprecated methods and arguments ⚠️ 👋

Adds support for table-specific transforms configs 🚀

Misc improvements ⚙️

0.10.1

Improvement ⚙️

Bug fix 🐛

All PRs

Contributors

0.10.0

Benchmark v2 🎉

Breaking changes

Deprecations

Trainer column partitioning removed 👋

Smaller notes 🧹

All PRs

Contributors

0.9.1

Internal improvements 🧹

All PRs

New Contributors

Contributors

0.9.0

Deprecation warning ⚠️

New features 🚀

Custom synthetics configs in Relational Trainer

Encoding of keys for transforms is now optional

Source column order preserved

Extraction subsetting

Performance improvements

All PRs

New Contributors

Contributors

0.8.2

Deprecation warning ⚠️

New features 🚀

JSON support

Single transforms config for all tables

Tables can be excluded from synthetics

Optional schema parameter for Connector

All PRs

Contributors