Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dtypes] FloatFormatter reverse transform does not support new pandas dtypes #857

Merged
merged 7 commits into from
Aug 6, 2024

Conversation

R-Palazzo
Copy link
Contributor

@R-Palazzo R-Palazzo commented Aug 1, 2024

CU-86b1gcgpc
Resolve #855

@R-Palazzo R-Palazzo self-assigned this Aug 1, 2024
@R-Palazzo R-Palazzo requested a review from a team as a code owner August 1, 2024 13:02
@sdv-team
Copy link
Contributor

sdv-team commented Aug 1, 2024

@R-Palazzo R-Palazzo removed the request for review from a team August 1, 2024 13:02
if self.learn_rounding_scheme and self._rounding_digits is not None:
data = data.round(self._rounding_digits)
elif is_integer:
data = data.round(0)

if pd.isna(data).any() and is_integer:
if pd.isna(data).any() and is_integer and not is_pandas_instance:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new pandas dtype support NaN integers so we could remove this if statement. However I had to add the check for pandas instance to make this test work:

def test__reverse_transform_rounding_none_with_nulls_dtype_int(self):

Let me know if this is fine. All our transformers accept pandas and np.ndarray object right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also pandas.array but I doubt these would come up in RDT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will have to revisit RDT on a performance audit against large inputs. The ideal scenario is that we have a pd.DataFrame as input (on the public method), and here we work with pd.Series or np.array, what it would be ideal is to work only with one input type and avoid the overhead of supporting two types of inputs.

Comment on lines 258 to 263
'Int8': pd.Series([1, 2, -3, np.nan, None, np.nan], dtype='Int8'),
'Int16': pd.Series([1, 2, -3, np.nan, None, np.nan], dtype='Int16'),
'Int32': pd.Series([1, 2, -3, np.nan, None, np.nan], dtype='Int32'),
'Int64': pd.Series([1, 2, -3, np.nan, None, np.nan], dtype='Int64'),
'Float32': pd.Series([1.1, 2.2, 3.3, np.nan, None, np.nan], dtype='Float32'),
'Float64': pd.Series([1.1, 2.2, 3.3, np.nan, None, np.nan], dtype='Float64'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Use pd.NA rather than np.nan as all NA-like values are replaced with pd.NA with the nullable dtypes.

if self.learn_rounding_scheme and self._rounding_digits is not None:
data = data.round(self._rounding_digits)
elif is_integer:
data = data.round(0)

if pd.isna(data).any() and is_integer:
if pd.isna(data).any() and is_integer and not is_pandas_instance:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also pandas.array but I doubt these would come up in RDT

@R-Palazzo R-Palazzo force-pushed the issue-855-int-floatformatter branch from d704002 to 43a948f Compare August 2, 2024 10:29
@R-Palazzo R-Palazzo changed the base branch from main to issue-858-learn-rounding-digits August 2, 2024 10:30
Base automatically changed from issue-858-learn-rounding-digits to main August 5, 2024 07:12
@R-Palazzo R-Palazzo force-pushed the issue-855-int-floatformatter branch from 9cc4f96 to b370435 Compare August 5, 2024 07:15
@R-Palazzo R-Palazzo requested a review from amontanez24 August 6, 2024 12:46
@R-Palazzo R-Palazzo merged commit 491e946 into main Aug 6, 2024
47 checks passed
@R-Palazzo R-Palazzo deleted the issue-855-int-floatformatter branch August 6, 2024 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[dtypes] FloatFormatter reverse transform does not support new pandas dtypes
5 participants