Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add preprocess method to synthesizers #1018

Closed
amontanez24 opened this issue Sep 21, 2022 · 0 comments
Closed

Add preprocess method to synthesizers #1018

amontanez24 opened this issue Sep 21, 2022 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

amontanez24 commented Sep 21, 2022

Problem Description

As a user, it would be useful to transform my data without having to fit the model.

In order to allow users to separately transform and model data, the functionality should be split into separate methods. The first is the preprocess method.

Expected behavior

  • Add preprocess method to BaseSynthesizer class.
  • Parameters:
    • data (pandas.DataFrame): The data to transform.
  • The method should do the following steps:
    1. Validates the data against the metadata (call synthesizer.validate)
    2. Transforms the data using constraints
    3. (if not already done) Creates HyperTransformer(s) + assign transformers
    4. (if not already fit) Fits HyperTransformers(s)
    5. Transforms data using the HyperTransformer(s)
    6. Return transformed data
  • Note that for CTGAN, TVAE and CopulaGAN, booleans and categorical columns should not be transformed

Warnings

  • If the model has already been fit, raise the following warning:
    Warning: This model has already been fit. To use the new preprocessed data, please refit the model using 'fit' or 'fit_processed_data'

Additional context

  • We may want to add a parameter to the DataProcessor that let's us override defaults for sdtypes to account for the CTGAN, TVAE and CopulaGAN cases. This is similar to what we do in the current SDV version. We could also directly update the transformers to use in the HyperTransformer.
  • If the HyperTransformer has not already been created, this method can call the assign_transformers method added in Add assign_transformers and get_transformers methods to synthesizers #1020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants