Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dtypes] FixedIncrements Fails with New Numerical Data Types #2157

Closed
pvk-developer opened this issue Jul 31, 2024 · 1 comment · Fixed by #2165 or #2195
Closed

[dtypes] FixedIncrements Fails with New Numerical Data Types #2157

pvk-developer opened this issue Jul 31, 2024 · 1 comment · Fixed by #2165 or #2195
Assignees
Labels
bug Something isn't working internal The issue doesn't change the API or functionality
Milestone

Comments

@pvk-developer
Copy link
Member

Error Description

The FixedIncrements constraint fails to apply to new numerical data types (Int, UInt) due to a TypeError: unsupported operand type(s) for &: 'bool' and 'float'. This error originates from RDT’s FloatFormatter. However it has to do with the following line:

    def _transform(self, table_data):
->        table_data[self.column_name] = table_data[self.column_name] / self.increment_value
        return table_data

This line changes the dtype of the column, therefore this is no longer UInt or Int, it gets converted to a float dtype because of the division being applied.

Steps to reproduce

import pandas as pd
import numpy as np


data = {
    'UInt8': pd.Series([1, 2, 3], dtype='UInt8') * 10,
    'UInt16': pd.Series([1, 2, 3], dtype='UInt16')* 10,
    'UInt32': pd.Series([1, 2, 3], dtype='UInt32')* 10,
    'UInt64': pd.Series([1, 2, 3], dtype='UInt64')* 10,
}
from sdv.metadata import SingleTableMetadata
metadata = SingleTableMetadata()

# Add the fields to the metadata
metadata.add_column('UInt8', sdtype='numerical', computer_representation='UInt8')
metadata.add_column('UInt16', sdtype='numerical', computer_representation='UInt16')
metadata.add_column('UInt32', sdtype='numerical', computer_representation='UInt32')
metadata.add_column('UInt64', sdtype='numerical', computer_representation='UInt64')



df = pd.DataFrame(data)
print(df.dtypes)

from sdv.single_table import GaussianCopulaSynthesizer

gcs = GaussianCopulaSynthesizer(metadata)

my_constraints = [
    {
        'constraint_class': 'FixedIncrements',
        'constraint_parameters': {
            'column_name': column,
            'increment_value':  10
            }
    }
    for column in df.columns
]

gcs.add_constraints(my_constraints)
gcs.fit(df)

File ~/.virtualenvs/SDV/lib/python3.10/site-packages/rdt/transformers/numerical.py:106, in FloatFormatter._validate_values_within_bounds(self, data)
    104 def _validate_values_within_bounds(self, data):
    105     if self.computer_representation != 'Float':
--> 106         fractions = data[~data.isna() & data % 1 != 0]
    107         if not fractions.empty:
    108             raise ValueError(
    109                 f"The column '{data.name}' contains float values {fractions.tolist()}. "
    110                 f"All values represented by '{self.computer_representation}' must be integers."
    111             )

Additional Context

If this can't be fixed directly here, file a new issue pointing to this one in RDT and apply any fixes required there.

@pvk-developer pvk-developer added bug Something isn't working internal The issue doesn't change the API or functionality labels Jul 31, 2024
@amontanez24 amontanez24 added this to the 1.15.1 milestone Aug 9, 2024
@amontanez24
Copy link
Contributor

The code in this issue still seems to crash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working internal The issue doesn't change the API or functionality
Projects
None yet
3 participants