Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Improve error message when casting ExtensionDtype to datetime #37553

Closed
arw2019 opened this issue Nov 1, 2020 · 3 comments · Fixed by #43270
Closed

BUG: Improve error message when casting ExtensionDtype to datetime #37553

arw2019 opened this issue Nov 1, 2020 · 3 comments · Fixed by #43270
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. good first issue NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Tests Unit test(s) needed to prevent regressions Timezones Timezone data dtype
Milestone

Comments

@arw2019
Copy link
Member

arw2019 commented Nov 1, 2020

We support casting int and float dtypes to DatetimeTZDtype:

In [11]: s = pd.Series(np.random.randint(10, size=3), dtype="int64") 
    ...: s.astype(pd.DatetimeTZDtype(tz="US/Pacific"))                                                  
Out[11]: 
0   1969-12-31 16:00:00.000000004-08:00
1   1969-12-31 16:00:00.000000007-08:00
2   1969-12-31 16:00:00.000000001-08:00
dtype: datetime64[ns, US/Pacific]

We should be able to do this with the corresponding extension dtypes (Int/Float). Currently this errors:

In [12]: s = pd.Series(np.random.randint(10, size=5), dtype="Int64") 
    ...: s.astype(pd.DatetimeTZDtype(tz="US/Pacific"))                                                  
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-469a4b536444> in <module>
      1 s = pd.Series(np.random.randint(10, size=5), dtype="Int64")
----> 2 s.astype(pd.DatetimeTZDtype(tz="US/Pacific"))

/workspaces/pandas-arw2019/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5797         else:
   5798             # else, only a single dtype is given
-> 5799             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   5800             return self._constructor(new_data).__finalize__(self, method="astype")
   5801 

/workspaces/pandas-arw2019/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    620         self, dtype, copy: bool = False, errors: str = "raise"
    621     ) -> "BlockManager":
--> 622         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    623 
    624     def convert(

/workspaces/pandas-arw2019/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
    422                     applied = b.apply(f, **kwargs)
    423                 else:
--> 424                     applied = getattr(b, f)(**kwargs)
    425             except (TypeError, NotImplementedError):
    426                 if not ignore_failures:

/workspaces/pandas-arw2019/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    608         if self.is_extension:
    609             try:
--> 610                 values = self.values.astype(dtype)
    611             except (ValueError, TypeError):
    612                 if errors == "ignore":

/workspaces/pandas-arw2019/pandas/core/arrays/integer.py in astype(self, dtype, copy)
    471             na_value = lib.no_default
    472 
--> 473         return self.to_numpy(dtype=dtype, na_value=na_value, copy=False)
    474 
    475     def _values_for_argsort(self) -> np.ndarray:

/workspaces/pandas-arw2019/pandas/core/arrays/masked.py in to_numpy(self, dtype, copy, na_value)
    222             data[self._mask] = na_value
    223         else:
--> 224             data = self._data.astype(dtype, copy=copy)
    225         return data
    226 

TypeError: data type not understood

We should get this to work for both ExtensionArray and the masked versions.

@arw2019 arw2019 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. NA - MaskedArrays Related to pd.NA and nullable extension arrays Timezones Timezone data dtype and removed Dtype Conversions Unexpected or buggy dtype conversions labels Nov 1, 2020
@jbrockmendel
Copy link
Member

Does the desirability of this change if #37179 changes how the non-EA case works?

@arw2019
Copy link
Member Author

arw2019 commented Nov 1, 2020

Does the desirability of this change if #37179 changes how the non-EA case works?

Right. As of #37179 this shouldn't be supported.

Minor but once #37179 is in I'd be keen to align the error message between EA/non-EA cases. Instead of

TypeError: data type not understood    # current EA error message

I'd like to see

TypeError: dtype Int64 cannot be converted to datetime64[ns]  # desired EA error message

@arw2019 arw2019 added Error Reporting Incorrect or improved errors from pandas and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 1, 2020
@arw2019 arw2019 changed the title ENH/BUG: Support casting ExtensionDtype to datetime ENH/BUG: Improvr error message when casting ExtensionDtype to datetime Nov 1, 2020
@arw2019 arw2019 changed the title ENH/BUG: Improvr error message when casting ExtensionDtype to datetime ENH/BUG: Improve error message when casting ExtensionDtype to datetime Nov 1, 2020
@arw2019 arw2019 changed the title ENH/BUG: Improve error message when casting ExtensionDtype to datetime BUG: Improve error message when casting ExtensionDtype to datetime Nov 1, 2020
@mroeschke
Copy link
Member

This looks to work on master now (without the error which makes sense to me). Could use a test

In [1]: In [12]: s = pd.Series(np.random.randint(10, size=5), dtype="Int64")
   ...:     ...: s.astype(pd.DatetimeTZDtype(tz="US/Pacific"))
Out[1]:
0   1969-12-31 16:00:00.000000006-08:00
1   1969-12-31 16:00:00.000000006-08:00
2   1969-12-31 16:00:00.000000002-08:00
3   1969-12-31 16:00:00.000000009-08:00
4   1969-12-31 16:00:00.000000002-08:00
dtype: datetime64[ns, US/Pacific]

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Enhancement Error Reporting Incorrect or improved errors from pandas labels Aug 14, 2021
@jreback jreback added this to the 1.4 milestone Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. good first issue NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Tests Unit test(s) needed to prevent regressions Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants