-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dtypes] FloatFormatter
reverse transform does not support new pandas dtypes
#857
Conversation
rdt/transformers/numerical.py
Outdated
if self.learn_rounding_scheme and self._rounding_digits is not None: | ||
data = data.round(self._rounding_digits) | ||
elif is_integer: | ||
data = data.round(0) | ||
|
||
if pd.isna(data).any() and is_integer: | ||
if pd.isna(data).any() and is_integer and not is_pandas_instance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new pandas dtype support NaN integers so we could remove this if statement. However I had to add the check for pandas instance to make this test work:
RDT/tests/unit/transformers/test_numerical.py
Line 461 in dd654c4
def test__reverse_transform_rounding_none_with_nulls_dtype_int(self): |
Let me know if this is fine. All our transformers accept
pandas
and np.ndarray
object right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also pandas.array but I doubt these would come up in RDT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will have to revisit RDT
on a performance audit against large inputs. The ideal scenario is that we have a pd.DataFrame
as input (on the public method), and here we work with pd.Series
or np.array
, what it would be ideal is to work only with one input type and avoid the overhead of supporting two types of inputs.
'Int8': pd.Series([1, 2, -3, np.nan, None, np.nan], dtype='Int8'), | ||
'Int16': pd.Series([1, 2, -3, np.nan, None, np.nan], dtype='Int16'), | ||
'Int32': pd.Series([1, 2, -3, np.nan, None, np.nan], dtype='Int32'), | ||
'Int64': pd.Series([1, 2, -3, np.nan, None, np.nan], dtype='Int64'), | ||
'Float32': pd.Series([1.1, 2.2, 3.3, np.nan, None, np.nan], dtype='Float32'), | ||
'Float64': pd.Series([1.1, 2.2, 3.3, np.nan, None, np.nan], dtype='Float64'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Use
pd.NA
rather thannp.nan
as all NA-like values are replaced with pd.NA with the nullable dtypes.
rdt/transformers/numerical.py
Outdated
if self.learn_rounding_scheme and self._rounding_digits is not None: | ||
data = data.round(self._rounding_digits) | ||
elif is_integer: | ||
data = data.round(0) | ||
|
||
if pd.isna(data).any() and is_integer: | ||
if pd.isna(data).any() and is_integer and not is_pandas_instance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also pandas.array but I doubt these would come up in RDT
d704002
to
43a948f
Compare
9cc4f96
to
b370435
Compare
CU-86b1gcgpc
Resolve #855