Releases: gretelai/trainer
v0.11.4
v0.11.3
⚠️ Relational Trainer is deprecated
The gretel_trainer.relational
module is deprecated and will be removed in a future release. Relational workloads can be executed using Gretel Workflows. Please see our docs on Workflows generally, and in particular the gretel_tabular action and our various first-class Connectors.
Other final Relational updates
- Bypass JOIN tables from independent training
- Relational Transform v2 jobs are more efficient
- Pass through
schema
for extraction - Improved relational job queuing/buffering strategy
Benchmark
- Benchmark now automatically writes snapshots of in-progress session results to the working directory to help with interrupted notebook sessions
Client version
Minimum required version of the gretel-client
has been bumped to 0.17.7
Full Changelog: v0.11.2...v0.11.3
v0.11.2
Full Changelog: v0.11.1...v0.11.2
0.11.1
Transform v2
Allow Transform v2 configs in Relational Transform
Bug fixes 🐛
- Improve foreign key synthesis logic to handle chained foreign keys
- Improve report text legibility in dark mode
Full Changelog: v0.11.0...v0.11.1
0.11.0
Fully removes deprecated methods and arguments ⚠️ 👋
- MultiTable
gretel_model
argument (replaced by config argument ontrain_synthetics
) - MultiTable
train
method (replaced bytrain_synthetics
) - MultiTable
train_transforms_models
method (replaced bytrain_transforms
) - RelationalData
[add|remove]_foreign_key
methods (replaced by[add|remove]_foreign_key_constraint
)
Adds support for table-specific transforms configs 🚀
You can now pass an optional table_specific_configs: dict[str, GretelModelConfig]
argument to train_transforms
. This works identically to the same existing argument on train_synthetics
.
Misc improvements ⚙️
- Changes default synthetics model from Amplify to ACTGAN
- Unifies how individual and cross-table SQS evaluation is performed regardless of synthetic strategy
- Adds dependency ordering to Connector
save
to avoid violating foreign key constraints
Full Changelog: v0.10.1...v0.11.0
0.10.1
Improvement ⚙️
Bump minimal required version of Gretel python client to take advantage of improved artifact handling methods in hybrid environments. (Applies to Trainer, Benchmark, and Relational.)
Bug fix 🐛
Fix a bug in Relational's JSON handling of nested lists of objects.
All PRs
- Small language tweaks to benchmark notebook by @mikeknep in #152
- Use get_artifact_handle instead of smart_open directly by @mckornfield in #153
- Handle nested lists of nested objects by @mikeknep in #154
Full Changelog: v0.10.0...v0.10.1
0.10.0
Benchmark v2 🎉
Benchmark has received several internal improvements. While general usage mostly stays the same, there are a few user-facing changes from previous versions:
Breaking changes
Datatype.TABULAR_NUMERIC
andDatatype.TABULAR_MIXED
have been replaced by a single enum variant,Datatype.TABULAR
.- If you're passing a list of multiple sources to
make_dataset
an exception will be raised.
Deprecations
make_dataset
is being replaced bycreate_dataset
- The freestanding functions for Gretel datasets (
get_gretel_dataset
,list_gretel_datasets
,list_gretel_dataset_tags
) are being replaced by methods on a new object:repo = GretelDatasetRepo() repo.get_dataset(...) repo.list_datasets(...) repo.list_gretel_dataset_tags(...)
Trainer column partitioning removed 👋
Trainer no longer partitions datasets by column. The max_header_clusters
argument to the Gretel model classes in gretel_trainer.models
is deprecated, and will be removed in a future release.
Smaller notes 🧹
- A bug when downloading record handler data in Relational Trainer in hybrid deployments has been fixed
- Relational Trainer uses Pandas features that were added in
1.5
, so the dependency version has been corrected. - Trainer no longer depends on
gretel-synthetics
All PRs
- Clean up some imports by @mikeknep in #141
- Drop relational drawing spike by @mikeknep in #143
- Dependency tweaks by @mikeknep in #145
- Benchmark v2 by @mikeknep in #146
- Misc by @mikeknep in #148
- Remove column partitioning by @mikeknep in #144
- Move benchmark log setup out of init by @mikeknep in #149
- Export helper as fixture instead of importing from test package by @mikeknep in #147
- Cleanup by @mikeknep in #150
- Wrap record handler data download in smart_open by @mikeknep in #151
Full Changelog: v0.9.1...v0.10.0
0.9.1
Internal improvements 🧹
Source data is now stored on disk in CSV format rather than in memory as Pandas DataFrames, resulting in a reduced overall memory footprint.
Synthetic composite keys now more accurately reflect characteristics of the source data.
JSON handling has been refactored and an edge case where too-long and/or deeply-nested JSON table names caused failures in Gretel Cloud has been fixed.
All PRs
- Core refactors by @mikeknep in #126
- Source on disk by @mikeknep in #130
- fix: unit test errors on new mac m2 by @benmccown in #134
- Chunked independent synthetics preprocessing by @mikeknep in #133
- fix: nested json table string length by @benmccown in #135
- Expose ExtractorConfig from root relational module by @mikeknep in #137
- Fix RTD by @matthewgrossman in #138
- New impl for synthesizing composite keys by @mikeknep in #136
- Re-add source_ prefix to archived source files by @mikeknep in #140
New Contributors
- @benmccown made their first contribution in #134
- @matthewgrossman made their first contribution in #138
Full Changelog: v0.9.0...v0.9.1
0.9.0
Deprecation warning ⚠️
The gretel_model
parameter on the Relational Trainer MultiTable
class is deprecated in favor of passing a specific model config to the train_synthetics
method. See the section on custom synthetics configs below for more detail.
New features 🚀
Custom synthetics configs in Relational Trainer
Users can now provide custom synthetic model configs instead of being limited to a handful of synthetic blueprint configs. Additionally, users can provide different model configs for different tables. In both cases Relational Synthetics accepts blueprint strings, dictionary configs, or paths to model config yaml files as config inputs.
Example:
multi_table.train_synthetics(
config="./my_actgan_config.yaml",
table_specific_configs={
"users": "synthetics/tabular-lstm",
"events": amplify_config_dict,
}
)
The config
argument defines the default config to use for all tables. This argument is currently optional for backwards compatibility, but should be provided going forward and will be required when the gretel_model
parameter is fully removed (see deprecation warnings above).
The optional table_specific_configs
argument can be used to set different configs for individual tables.
Encoding of keys for transforms is now optional
Users can opt into or out of key encoding in Relational Transforms. Encoding provides increased security benefits at the cost of referential integrity to the source data and other tables not in scope. The default is to not encode keys.
multi_table.run_transforms(encode_keys=True)
Source column order preserved
The output tables from Relational Transforms and Relational Synthetics present columns in the same order as source tables.
Extraction subsetting
Relational connectors can now perform smart subsetting of source data instead of extracting entire tables.
Performance improvements
Improved performance creating archive files and pre-processing training data.
All PRs
- Write and drop by @mikeknep in #117
- Table extraction module + subsetting by @johntmyers in #118
- Bettertar by @mikeknep in #119
- Jm/docs updates by @johntmyers in #121
- Custom synthetics configs by @mikeknep in #120
- Column order by @mikeknep in #123
- table extractor dask fixes by @johntmyers in #124
- Deprecation warnings + bugfix by @mikeknep in #122
- bq support plus better random and contiguous fallback by @johntmyers in #125
- Do not encode keys for transforms by default by @mckornfield in #127
New Contributors
- @mckornfield made their first contribution in #127
Full Changelog: v0.8.2...v0.8.3
0.8.2
Deprecation warning ⚠️
Two methods have been deprecated and will be removed in a future release.
train_transforms_models
is deprecated in favor oftrain_transforms
train
is deprecated in favor oftrain_synthetics
More on these below.
New features 🚀
JSON support
Gretel Relational can now handle columns with nested JSON. No changes to existing code are required. Depending on shape, nested JSON may lead to additional models being trained under the hood.
Single transforms config for all tables
The new train_transforms
method (replacing the now-deprecated train_transforms_models
) accepts a single Transforms model config and applies it to all tables. The set of tables being transformed can be controlled via the optional only
(inclusive) and ignore
(exclusive) parameters.
Tables can be excluded from synthetics
The new train_synthetics
method (replacing the now-deprecated train
) supports omitting tables from synthetics training. This is useful if you have a table with static reference data that should never be synthesized. As above, the scope of tables is controlled by the optional only
and ignore
parameters.
Optional schema parameter for Connector
The Connector#extract
method takes an optional schema
parameter that gets forwarded to SQLAlchemy and Pandas. Note: this parameter is dialect-specific and not supported by all databases.
All PRs
- Support for JSON by @mikeknep in #107
- Single transforms config by @mikeknep in #109
- Kill off all the old style types in Relational code by @mikeknep in #110
- Reduce unnecessary JSON parsing by @mikeknep in #112
- Preserve tables before training by @mikeknep in #111
- Fix bug related to empty normalized tables by @mikeknep in #113
- Add optional schema param to connectors by @mikeknep in #116
- Adjust JSON log locations by @mikeknep in #115
- Notebook update by @mikeknep in #114
Full Changelog: v0.8.1...v0.8.2