Skip to content

Releases: gretelai/trainer

v0.11.4

30 Apr 19:57
Compare
Choose a tag to compare

Benchmark

  • ⚙️ n_jobs param added to Benchmark Config
  • 🐛 Avoid race condition around job submission and snapshots

Full Changelog: v0.11.3...v0.11.4

v0.11.3

19 Mar 18:59
Compare
Choose a tag to compare

⚠️ Relational Trainer is deprecated

The gretel_trainer.relational module is deprecated and will be removed in a future release. Relational workloads can be executed using Gretel Workflows. Please see our docs on Workflows generally, and in particular the gretel_tabular action and our various first-class Connectors.

Other final Relational updates

  • Bypass JOIN tables from independent training
  • Relational Transform v2 jobs are more efficient
  • Pass through schema for extraction
  • Improved relational job queuing/buffering strategy

Benchmark

  • Benchmark now automatically writes snapshots of in-progress session results to the working directory to help with interrupted notebook sessions

Client version

Minimum required version of the gretel-client has been bumped to 0.17.7

Full Changelog: v0.11.2...v0.11.3

v0.11.2

09 Jan 23:48
Compare
Choose a tag to compare

Full Changelog: v0.11.1...v0.11.2

0.11.1

20 Oct 18:49
Compare
Choose a tag to compare

Transform v2

Allow Transform v2 configs in Relational Transform

Bug fixes 🐛

  • Improve foreign key synthesis logic to handle chained foreign keys
  • Improve report text legibility in dark mode

Full Changelog: v0.11.0...v0.11.1

0.11.0

25 Sep 18:27
Compare
Choose a tag to compare

Fully removes deprecated methods and arguments ⚠️ 👋

  • MultiTable gretel_model argument (replaced by config argument on train_synthetics)
  • MultiTable train method (replaced by train_synthetics)
  • MultiTable train_transforms_models method (replaced by train_transforms)
  • RelationalData [add|remove]_foreign_key methods (replaced by [add|remove]_foreign_key_constraint)

Adds support for table-specific transforms configs 🚀

You can now pass an optional table_specific_configs: dict[str, GretelModelConfig] argument to train_transforms. This works identically to the same existing argument on train_synthetics.

Misc improvements ⚙️

  • Changes default synthetics model from Amplify to ACTGAN
  • Unifies how individual and cross-table SQS evaluation is performed regardless of synthetic strategy
  • Adds dependency ordering to Connector save to avoid violating foreign key constraints

Full Changelog: v0.10.1...v0.11.0

0.10.1

11 Aug 19:21
Compare
Choose a tag to compare

Improvement ⚙️

Bump minimal required version of Gretel python client to take advantage of improved artifact handling methods in hybrid environments. (Applies to Trainer, Benchmark, and Relational.)

Bug fix 🐛

Fix a bug in Relational's JSON handling of nested lists of objects.


All PRs

Full Changelog: v0.10.0...v0.10.1

0.10.0

02 Aug 18:57
11ab6c2
Compare
Choose a tag to compare

Benchmark v2 🎉

Benchmark has received several internal improvements. While general usage mostly stays the same, there are a few user-facing changes from previous versions:

Breaking changes

  • Datatype.TABULAR_NUMERIC and Datatype.TABULAR_MIXED have been replaced by a single enum variant, Datatype.TABULAR.
  • If you're passing a list of multiple sources to make_dataset an exception will be raised.

Deprecations

  • make_dataset is being replaced by create_dataset
  • The freestanding functions for Gretel datasets (get_gretel_dataset, list_gretel_datasets, list_gretel_dataset_tags) are being replaced by methods on a new object:
    repo = GretelDatasetRepo()
    repo.get_dataset(...)
    repo.list_datasets(...)
    repo.list_gretel_dataset_tags(...)

Trainer column partitioning removed 👋

Trainer no longer partitions datasets by column. The max_header_clusters argument to the Gretel model classes in gretel_trainer.models is deprecated, and will be removed in a future release.

Smaller notes 🧹

  • A bug when downloading record handler data in Relational Trainer in hybrid deployments has been fixed
  • Relational Trainer uses Pandas features that were added in 1.5, so the dependency version has been corrected.
  • Trainer no longer depends on gretel-synthetics

All PRs

Full Changelog: v0.9.1...v0.10.0

0.9.1

24 Jul 18:01
795ee98
Compare
Choose a tag to compare

Internal improvements 🧹

Source data is now stored on disk in CSV format rather than in memory as Pandas DataFrames, resulting in a reduced overall memory footprint.

Synthetic composite keys now more accurately reflect characteristics of the source data.

JSON handling has been refactored and an edge case where too-long and/or deeply-nested JSON table names caused failures in Gretel Cloud has been fixed.


All PRs

New Contributors

Full Changelog: v0.9.0...v0.9.1

0.9.0

20 Jun 17:16
156d929
Compare
Choose a tag to compare

Deprecation warning ⚠️

The gretel_model parameter on the Relational Trainer MultiTable class is deprecated in favor of passing a specific model config to the train_synthetics method. See the section on custom synthetics configs below for more detail.

New features 🚀

Custom synthetics configs in Relational Trainer

Users can now provide custom synthetic model configs instead of being limited to a handful of synthetic blueprint configs. Additionally, users can provide different model configs for different tables. In both cases Relational Synthetics accepts blueprint strings, dictionary configs, or paths to model config yaml files as config inputs.

Example:

multi_table.train_synthetics(
    config="./my_actgan_config.yaml",
    table_specific_configs={
        "users": "synthetics/tabular-lstm",
        "events": amplify_config_dict,
    }
)

The config argument defines the default config to use for all tables. This argument is currently optional for backwards compatibility, but should be provided going forward and will be required when the gretel_model parameter is fully removed (see deprecation warnings above).

The optional table_specific_configs argument can be used to set different configs for individual tables.

Encoding of keys for transforms is now optional

Users can opt into or out of key encoding in Relational Transforms. Encoding provides increased security benefits at the cost of referential integrity to the source data and other tables not in scope. The default is to not encode keys.

multi_table.run_transforms(encode_keys=True)

Source column order preserved

The output tables from Relational Transforms and Relational Synthetics present columns in the same order as source tables.

Extraction subsetting

Relational connectors can now perform smart subsetting of source data instead of extracting entire tables.

Performance improvements

Improved performance creating archive files and pre-processing training data.


All PRs

New Contributors

Full Changelog: v0.8.2...v0.8.3

0.8.2

01 Jun 19:26
19a08ea
Compare
Choose a tag to compare

Deprecation warning ⚠️

Two methods have been deprecated and will be removed in a future release.

  • train_transforms_models is deprecated in favor of train_transforms
  • train is deprecated in favor of train_synthetics

More on these below.

New features 🚀

JSON support

Gretel Relational can now handle columns with nested JSON. No changes to existing code are required. Depending on shape, nested JSON may lead to additional models being trained under the hood.

Single transforms config for all tables

The new train_transforms method (replacing the now-deprecated train_transforms_models) accepts a single Transforms model config and applies it to all tables. The set of tables being transformed can be controlled via the optional only (inclusive) and ignore (exclusive) parameters.

Tables can be excluded from synthetics

The new train_synthetics method (replacing the now-deprecated train) supports omitting tables from synthetics training. This is useful if you have a table with static reference data that should never be synthesized. As above, the scope of tables is controlled by the optional only and ignore parameters.

Optional schema parameter for Connector

The Connector#extract method takes an optional schema parameter that gets forwarded to SQLAlchemy and Pandas. Note: this parameter is dialect-specific and not supported by all databases.


All PRs

Full Changelog: v0.8.1...v0.8.2