Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User validation for set_config #478

Closed
npatki opened this issue Apr 6, 2022 · 0 comments
Closed

User validation for set_config #478

npatki opened this issue Apr 6, 2022 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Apr 6, 2022

Problem Description

When a user uses set_config, we should validate their input & throw appropriate warnings/errors.

Expected behavior

Perform the following 6 validations when using this function.

1. Users attempts to use set_config after using fit

See #466

2. There must be an sdtypes dict and a transformers dict

Both dicts must be present. No other dicts are allowed.

ht = HyperTransformer()
ht.set_config({})
Error: Invalid config. Please provide 2 dictionaries named 'sdtypes' and 'transformers'.

3. Column names in sdtypes and transformers dict do not match

Both dicts should have the exact same set of column names as keys. (They do not have to be in the same order.)

ht = HyperTransformer
ht.set_config({
  'sdtypes': {
    'column_A': 'numerical',
    'column_B': 'categorical'
  },
  'transformers': {
    'column_A': FloatFormatter(),
    'column_C': BinaryEncoder
  }
})
Error: The column names in the 'sdtypes' dictionary must match the column names in the 'transformers' dictionary.

4. Invalid sdtypes

Throw an error if a user provides an sdtype that cannot be recognized
If they are using open source RDT, then premium sdtypes will not be recognized either.

ht.set_config({
  'sdtypes': {
    'column_A':  'unknown',
    'column_B': 'phone_number'
  }
  ...
})
Invalid sdtypes: ['unknown', 'phone_number']. If you are trying to use a premium sdtype, contact info@sdv.dev about RDT Add-Ons.

5. Invalid transformers

ht.set_config({
  'sdtypes': { ... }
  'transformers': {
    'column_A': "FrequencyEncoder",
    'column_B': 4.0
  }
})
Invalid transformers for columns: ['column_A', 'column_B']. Please assign an rdt transformer object to each column name.

6. Transformers are not compatible with the sdtypes

ht.set_config({
  'sdtypes': {
    'column_A': 'categorical', 
    'column_B': 'datetime'
  }, 'transformers': {
    'column_A': FloatFormatter(),
    'column_B': BinaryEncoder()
  }
})
Error: Some transformers you've assigned are not compatible with the sdtypes. Please change the following columns: ['column_A', 'column_B']
@npatki npatki added the feature request Request for a new feature label Apr 6, 2022
@amontanez24 amontanez24 added this to the 1.0.0 milestone Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

3 participants