You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please indicate the following details about the environment in which you found the bug:
SDV version: 1.8.0
Python version: Any
Operating System: Any
Error Description
When CTGANSynthesizer attempts to estimate the number of columns, it tries to directly access the dictionary of transformers which causes the following error when applying a constraint that produces an output of categorical or boolean value:
---------------------------------------------------------------------------KeyErrorTraceback (mostrecentcalllast)
CellIn[15], line198my_constraint= {
9'constraint_class': 'FixedCombinations',
10'constraint_parameters': {
11'column_names': ['high_spec', 'degree_type']
12 }
13 }
15my_synthesizer.add_constraints(constraints=[
16my_constraint17 ])
--->19my_synthesizer.fit(data)
File~/Projects/sdv-dev/SDV/sdv/single_table/base.py:436, inBaseSynthesizer.fit(self, data)
434self._data_processor.reset_sampling()
435self._random_state_set=False-->436processed_data=self._preprocess(data)
437self.fit_processed_data(processed_data)
File~/Projects/sdv-dev/SDV/sdv/single_table/ctgan.py:192, inCTGANSynthesizer._preprocess(self, data)
190self.validate(data)
191self._data_processor.fit(data)
-->192self._print_warning(data)
194returnself._data_processor.transform(data)
File~/Projects/sdv-dev/SDV/sdv/single_table/ctgan.py:167, inCTGANSynthesizer._print_warning(self, data)
165def_print_warning(self, data):
166"""Print a warning if the number of columns generated is over 1000."""-->167dict_generated_columns=self._estimate_num_columns(data)
168ifsum(dict_generated_columns.values()) >1000:
169header= {'Original Column Name ': 'Est # of Columns (CTGAN)'}
File~/Projects/sdv-dev/SDV/sdv/single_table/ctgan.py:157, inCTGANSynthesizer._estimate_num_columns(self, data)
154num_generated_columns[column] =11156elifsdtypes[column] in {'categorical', 'boolean'}:
-->157iftransformers[column] isNone:
158num_categories=data[column].fillna(np.nan).nunique(dropna=False)
159num_generated_columns[column] =num_categoriesKeyError: 'high_spec'
The text was updated successfully, but these errors were encountered:
pvk-developer
added
bug
Something isn't working
new
Automatic label applied to new issues
and removed
new
Automatic label applied to new issues
labels
Dec 12, 2023
npatki
changed the title
KeyError in CTGANSynthesizer during _estimate_num_columns when applying constraints that return a categorical or boolean
KeyError in CTGANSynthesizer when applying constraints that return a categorical or booleanDec 12, 2023
npatki
changed the title
KeyError in CTGANSynthesizer when applying constraints that return a categorical or boolean
KeyError in CTGANSynthesizer when applying FixedCombinations constraint
Dec 12, 2023
Updated title to something that is more user-facing. Right now, FixedCombinations is the only constraint that produces categorical or boolean columns. But we should also validate that this new column estimation logic won't mess up with any custom constraints.
Note that the estimation logic is meant to be just that -- an estimation. So to fix this issue, it's not necessary to compute the constraint and get the exact # of columns.
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
When
CTGANSynthesizer
attempts to estimate the number of columns, it tries to directly access the dictionary of transformers which causes the following error when applying a constraint that produces an output ofcategorical
orboolean
value:Steps to reproduce
The text was updated successfully, but these errors were encountered: