You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, I'd like a way to specify that multiple columns in my dataset are used to makeup the same address. This way, the synthetic data created maintains a relationship between those columns and makes a valid address.
(multi table only) The table_name must be found in the metadata Error: Unknown table name 'userss'. Please choose a table name from the metadata.
The column_names must be found in the metadata for the table Error: Unknown column names ('A', 'B', 'C'). Please choose column names listed in the metadata for your table.
The sdtypes for the columns must be compatible with the address transformer. That is they must be one of : 'country', 'country_code', 'administrative_unit', etc. Error: Column 'city_name' has invalid sdtype 'categorical'. Please provide a column that is compatible with address data.
You cannot have 2 or more of the same sdtype within an address. Error: Columns 'state_name' and 'state' have the same sdtype 'administrative_unit'. Your address data cannot have duplicate fields.
If the user has already fit the data, show a warning. The user will need to re-fit in order to get this to work. Warning: Please refit your synthesizer for the address changes to appear in your synthetic data.
This method is meant to replace the update_transformers method for address columns. It will happen after the HyperTransformer config is created.
We will want to keep track of columns that are treated as an address, so we may need to store that information in a list/dict somewhere. This will help for error handling when users call update_transformers on one of these columns or try to add them to constraints.
The text was updated successfully, but these errors were encountered:
Problem Description
As a user, I'd like a way to specify that multiple columns in my dataset are used to makeup the same address. This way, the synthetic data created maintains a relationship between those columns and makes a valid address.
Expected behavior
In the BaseSynthesizer and BaseMultiTableSynthesizer classes, add a method called
add_address_columns
Parameters
table_name
: String that is the name of the table. This must be one of the tables specified in the metadata.column_names
: A list of one or more column names. These must be specified in the metadata for the table.anonymization_level
: String that is the type of anonymization the user wants to see. Must be one of:Behavior
Follow the logic below for transformer assignment
Validation
Error: Unknown table name 'userss'. Please choose a table name from the metadata.
Error: Unknown column names ('A', 'B', 'C'). Please choose column names listed in the metadata for your table.
Error: Column 'city_name' has invalid sdtype 'categorical'. Please provide a column that is compatible with address data.
Error: Columns 'state_name' and 'state' have the same sdtype 'administrative_unit'. Your address data cannot have duplicate fields.
Warning: Please refit your synthesizer for the address changes to appear in your synthetic data.
Additional context
update_transformers
method for address columns. It will happen after theHyperTransformer
config is created.update_transformers
on one of these columns or try to add them to constraints.The text was updated successfully, but these errors were encountered: