Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make synthesizers work with column_relationships #1700

Closed
amontanez24 opened this issue Nov 29, 2023 · 0 comments · Fixed by #1727
Closed

Make synthesizers work with column_relationships #1700

amontanez24 opened this issue Nov 29, 2023 · 0 comments · Fixed by #1727
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

amontanez24 commented Nov 29, 2023

Problem Description

As a user, I expect my synthetic data to appropriately maintain column relationships if I specify it in my metadata.

Expected behavior

Synthesizer Initialization

  • Raise a warning if column relationships are provided in the metadata but the user doesn't have permissions to use the advanced transformers
synthesizer = HMASynthesizer(metadata)

Warning: The metadata contains a column relationship of type 'address'. This relationship will be ignored. For higher quality data in this relationship, please inquire about the SDV Enterprise tier.

Transformer assignment

  • For the address type, we should automatically assign the RandomLocationGenerator to the set of columns. However, the user would be able to update this using update_transformers – just like they would with any other transformer.
>>> synthesizer.auto_assign_transformers(data)
>>> synthesizer.get_transformers()
{
  'age': FloatFormatter(...),
  'credit_card': UniformEncoder(),
  ('vendor_city', 'vendor_state', ...): address.RandomLocationGenerator()
}

>>> from rdt.transformers.address import RegionalAnonymizer
>>> synthesizer.update_transformers({
  ('vendor_city', 'vendor_state', ...): RegionalAnonymizer()
})
  • Deprecate the set_address_column method
    • No need to do any logic in this method. Just raise the warning.
>>> metadata = MultiTableMetadata.load_from_json(...)
>>> synthesizer = HSASynthesizer(metadata)
>>> synthesizer.set_address_columns(...)
Warning: set_address_columns is deprecated. Please add these columns directly to your metadata using 'add_column_relationship'

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants