Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HMASynthesizer errors out when fitting a dataset that has a table which holds primary key and foreign keys only #1257

Closed
pvk-developer opened this issue Feb 12, 2023 · 0 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@pvk-developer
Copy link
Member

Environment Details

  • SDV version: 1.0.0.dev
  • Python version: 3.8
  • Operating System: MacOS

Error Description

The HMASynthesizer fails to fit a dataset if it has a table that is contains only a primary key and foreign keys. This error occurs because all the columns have been dropped for the given table therefore it can't fit the model.

File ~/Projects/sdv-dev/SDV/sdv/multi_table/hma.py:188, in HMASynthesizer._model_table(self, table_name, tables)
    184 self._clear_nans(table)
    185 LOGGER.info('Fitting %s for table %s; shape: %s', self._synthesizer.__name__,
    186             table_name, table.shape)
--> 188 self._table_synthesizers[table_name].fit_processed_data(table)
    190 for name, values in keys.items():
    191     table[name] = values

File ~/Projects/sdv-dev/SDV/sdv/single_table/base.py:421, in BaseSynthesizer.fit_processed_data(self, processed_data)
    414 def fit_processed_data(self, processed_data):
    415     """Fit this model to the transformed data.
    416 
    417     Args:
    418         processed_data (pandas.DataFrame):
    419             The transformed data used to fit the model to.
    420     """
--> 421     self._fit(processed_data)
    422     self._fitted = True
    423     self._fitted_date = datetime.datetime.today().strftime('%Y-%m-%d')

File ~/Projects/sdv-dev/SDV/sdv/single_table/copulas.py:132, in GaussianCopulaSynthesizer._fit(self, processed_data)
    130 with warnings.catch_warnings():
    131     warnings.filterwarnings('ignore', module='scipy')
--> 132     self._model.fit(processed_data)

....

ValueError: need at least one array to concatenate

This is what the Metadata structure looks like:
image

As seen in the image the error occurs on the gmember table.

Steps to reproduce

from sdv.multi_table import HMASynthesizer
from sdv.datasets.demo import download_demo, get_available_demos

data, metadata = download_demo('multi_table', 'Biodegradability_v1')

hmas = HMASynthesizer(metadata)
hmas.fit(data)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants