You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, it would be useful to transform my data without having to fit the model.
In order to allow users to separately transform and model data, the functionality should be split into separate methods. The first is the preprocess method.
Expected behavior
Add preprocess method to BaseSynthesizer class.
Parameters:
data (pandas.DataFrame): The data to transform.
The method should do the following steps:
Validates the data against the metadata (call synthesizer.validate)
Transforms the data using constraints
(if not already done) Creates HyperTransformer(s) + assign transformers
(if not already fit) Fits HyperTransformers(s)
Transforms data using the HyperTransformer(s)
Return transformed data
Note that for CTGAN, TVAE and CopulaGAN, booleans and categorical columns should not be transformed
Warnings
If the model has already been fit, raise the following warning: Warning: This model has already been fit. To use the new preprocessed data, please refit the model using 'fit' or 'fit_processed_data'
Additional context
We may want to add a parameter to the DataProcessor that let's us override defaults for sdtypes to account for the CTGAN, TVAE and CopulaGAN cases. This is similar to what we do in the current SDV version. We could also directly update the transformers to use in the HyperTransformer.
Problem Description
As a user, it would be useful to transform my data without having to fit the model.
In order to allow users to separately transform and model data, the functionality should be split into separate methods. The first is the
preprocess
method.Expected behavior
preprocess
method toBaseSynthesizer
class.pandas.DataFrame
): The data to transform.synthesizer.validate
)CTGAN
,TVAE
andCopulaGAN
, booleans and categorical columns should not be transformedWarnings
Warning: This model has already been fit. To use the new preprocessed data, please refit the model using 'fit' or 'fit_processed_data'
Additional context
DataProcessor
that let's us override defaults forsdtypes
to account for theCTGAN
,TVAE
andCopulaGAN
cases. This is similar to what we do in the current SDV version. We could also directly update the transformers to use in theHyperTransformer
.HyperTransformer
has not already been created, this method can call theassign_transformers
method added in Add assign_transformers and get_transformers methods to synthesizers #1020The text was updated successfully, but these errors were encountered: