Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BooleanTransformer.reverse_transform sometimes crashes with TypeError #210

Closed
katxiao opened this issue Aug 13, 2021 · 0 comments · Fixed by #212
Closed

BooleanTransformer.reverse_transform sometimes crashes with TypeError #210

katxiao opened this issue Aug 13, 2021 · 0 comments · Fixed by #212
Assignees
Labels
bug Something isn't working
Milestone

Comments

@katxiao
Copy link
Contributor

katxiao commented Aug 13, 2021

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • RDT version: 0.5.2.dev0
  • Python version: 3.8
  • Operating System: macOS 10.15.7

Error Description

When sampling data from a boolean column, we sometimes receive the error TypeError: Need to pass bool-like values from the BooleanTransformer.

This happens when the BooleanTransformer.reverse_transform method receives inputs that round to values outside of [0, 1] (the valid boolean float representations). When we cast those values to 'boolean', it throws this type error. We had previously been rounding to type 'bool', which would cast these out-of-bound floats to True. Now that we are rounding to 'boolean', we should clip the rounded values to [0, 1].

Steps to reproduce

We can reproduce the error with the following code snippet:

from sdv.demo import sample_relational_demo
from sdv.relational import HMA1

metadata, tables = sample_relational_demo(size=30)

model = HMA1(metadata)
model.fit(tables)

new_data = model.sample()

The above code sometimes crashes with:

  File "/SDV/sdv/relational/base.py", line 184, in sample
    return self._sample(table_name, num_rows, sample_children)
  File "/SDV/sdv/relational/hma.py", line 471, in _sample
    self._sample_table(table, num_rows, sampled_data=sampled_data)
  File "/SDV/sdv/relational/hma.py", line 430, in _sample_table
    self._sample_children(table_name, sampled_data, table_rows)
  File "/SDV/sdv/relational/hma.py", line 355, in _sample_children
    self._sample_children(child_name, sampled_data, child_rows)
  File "/SDV/sdv/relational/hma.py", line 352, in _sample_children
    self._sample_child_rows(child_name, table_name, row, sampled_data)
  File "/SDV/sdv/relational/hma.py", line 335, in _sample_child_rows
    table_rows = self._sample_rows(model, table_name)
  File "/SDV/sdv/relational/hma.py", line 318, in _sample_rows
    sampled = model.sample(num_rows)
  File "/SDV/sdv/tabular/base.py", line 442, in sample
    return self._sample_batch(num_rows, max_retries, max_rows_multiplier)
  File "/SDV/sdv/tabular/base.py", line 299, in _sample_batch
    sampled, num_valid = self._sample_rows(
  File "/SDV/sdv/tabular/base.py", line 235, in _sample_rows
    sampled = self._metadata.reverse_transform(sampled)
  File "/SDV/sdv/metadata/table.py", line 640, in reverse_transform
    reversed_data = self._hyper_transformer.reverse_transform(data)
  File "/lib/python3.8/site-packages/rdt/hyper_transformer.py", line 248, in reverse_transform
    reversed_data = transformer.reverse_transform(columns_data)
  File "/lib/python3.8/site-packages/rdt/transformers/boolean.py", line 88, in reverse_transform
    return np.round(data).astype('boolean').astype('object')
  File "/lib/python3.8/site-packages/pandas/core/generic.py", line 5546, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
  File "/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 595, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 406, in apply
    applied = getattr(b, f)(**kwargs)
  File "/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 595, in astype
    values = astype_nansafe(vals1d, dtype, copy=True)
  File "/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 919, in astype_nansafe
    return dtype.construct_array_type()._from_sequence(arr, dtype=dtype, copy=copy)
  File "/lib/python3.8/site-packages/pandas/core/arrays/boolean.py", line 279, in _from_sequence
    values, mask = coerce_to_array(scalars, copy=copy)
  File "/lib/python3.8/site-packages/pandas/core/arrays/boolean.py", line 158, in coerce_to_array
    raise TypeError("Need to pass bool-like values")
TypeError: Need to pass bool-like values
@katxiao katxiao added bug Something isn't working pending review and removed pending review labels Aug 13, 2021
@katxiao katxiao self-assigned this Aug 13, 2021
@csala csala added this to the 0.5.2 milestone Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants