-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interp with long cftime coordinates raises an error #3641
Comments
Another issue with
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.array_to_timedelta64()
pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.parse_timedelta_string()
TypeError: object of type 'datetime.timedelta' has no len()
During handling of the above exception, another exception occurred:
OverflowError Traceback (most recent call last)
<ipython-input-50-b03d9c4f220d> in <module>
----> 1 xr.core.duck_array_ops.datetime_to_numeric(i, cftime.DatetimeGregorian(2, 1, 1), datetime_unit='D')
~/src/xarray/xarray/core/duck_array_ops.py in datetime_to_numeric(array, offset, datetime_unit, dtype)
395 else:
396 offset = min(array)
--> 397 array = array - offset
398
399 if not hasattr(array, "dtype"): # scalar is converted to 0d-array
~/src/xarray/xarray/coding/cftimeindex.py in __sub__(self, other)
431
432 if isinstance(other, (CFTimeIndex, cftime.datetime)):
--> 433 return pd.TimedeltaIndex(np.array(self) - np.array(other))
434 elif isinstance(other, pd.TimedeltaIndex):
435 return CFTimeIndex(np.array(self) - other.to_pytimedelta())
~/.conda/envs/xclim3/lib/python3.6/site-packages/pandas/core/indexes/timedeltas.py in __new__(cls, data, unit, freq, start, end, periods, closed, dtype, copy, name, verify_integrity)
256
257 tdarr = TimedeltaArray._from_sequence(
--> 258 data, freq=freq, unit=unit, dtype=dtype, copy=copy
259 )
260 return cls._simple_new(tdarr._data, freq=tdarr.freq, name=name)
~/.conda/envs/xclim3/lib/python3.6/site-packages/pandas/core/arrays/timedeltas.py in _from_sequence(cls, data, dtype, copy, freq, unit)
270 freq, freq_infer = dtl.maybe_infer_freq(freq)
271
--> 272 data, inferred_freq = sequence_to_td64ns(data, copy=copy, unit=unit)
273 freq, freq_infer = dtl.validate_inferred_freq(freq, inferred_freq, freq_infer)
274
~/.conda/envs/xclim3/lib/python3.6/site-packages/pandas/core/arrays/timedeltas.py in sequence_to_td64ns(data, copy, unit, errors)
971 if is_object_dtype(data.dtype) or is_string_dtype(data.dtype):
972 # no need to make a copy, need to convert if string-dtyped
--> 973 data = objects_to_td64ns(data, unit=unit, errors=errors)
974 copy = False
975
~/.conda/envs/xclim3/lib/python3.6/site-packages/pandas/core/arrays/timedeltas.py in objects_to_td64ns(data, unit, errors)
1096 values = np.array(data, dtype=np.object_, copy=False)
1097
-> 1098 result = array_to_timedelta64(values, unit=unit, errors=errors)
1099 return result.view("timedelta64[ns]")
1100
pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.array_to_timedelta64()
pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.convert_to_timedelta64()
pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.delta_to_nanoseconds()
OverflowError: Python int too large to convert to C long |
Yes, there's a simple workaround for that at least, #3631 (comment), but I agree it would be nice if we didn't need to worry about that. |
Got it, thanks ! |
How about replacing |
That would indeed be a very clean approach (I don't know why that did not occur to me earlier!). In the past that kind of conversion used to have a bug, but it has been fixed as of NumPy 1.15 (see numpy/numpy#11096). |
Note that at the moment, if we pass np.datetime64 objects that exceed the allowed time span, the function yields garbage without failing. Is this something we want to fix as well ? One option is to convert array and offset to microseconds first, then compute the delta, but this may break people's code. |
I wanted to bring attention to a package that we are working on that originally started with remapping time axis. The package is called AxisUtilities and is available at https://github.com/coderepocenter/AxisUtilities. It doesn’t yet have any support for CFTime yet (well it does support it now; but you need to manually convert cftime to a number for now) But we are working on it (so that the cftime to number conversion is more automatic). We have the basis there. We are now working on it to make it easier to use and remove certain steps. It follows the ESMF or SCRIP interpolation pattern, i.e. once you make the remapper object, you could use it for multiple data set as long as the source and destination axis has not changed. |
Thanks @maboualidev; I saw that @andersy005 posted about this too. I haven't had a chance to look deeply into your new package, but I am intrigued by the concept. I think patterns for working with data defined over intervals, be they in time or some other dimension, are something useful and should be explored. #1475 is a good thread in particular if you are interested in ideas for how cell boundaries (and operations that depend on them) might be represented most cleanly within xarray. Discussion there seems somewhat dormant at the moment, but I'd jump in there if you have comments, ideas, or questions. |
MCVE Code Sample
Problem Description
In principle we should be able to get this to work. The issue stems from the following logic in
datetime_to_numeric
:xarray/xarray/core/duck_array_ops.py
Lines 402 to 404 in 45fd0e6
Here we are relying on pandas to convert an array of
datetime.timedelta
objects to an array with dtypetimedelta64[ns]
. If the array ofdatetime.timedelta
objects cannot be safely converted totimedelta64[ns]
(e.g. due to an integer overflow) then this line is silently a no-op which leads to the error downstream at the dtype conversion step. This is my fault originally for suggesting this approach, #2668 (comment).To solve this I think we'll need to write our own logic to convert(as @huard notes we should be able to use NumPy directly here for the conversion). We should not consider ourselves beholden to using nanosecond resolution for a couple of reasons:datetime.timedelta
objects to numeric values instead of relying on pandas/NumPy.datetime.timedelta
objects do not natively support nanosecond resolution; they have microsecond resolution natively, which corresponds with a NumPy timedelta range of +/- 2.9e5 years.DatetimeIndex
. We should do everything we can to support this with aCFTimeIndex
.@huard @dcherian this is an important issue we'll need to solve to be able to use a fixed offset for cftime dates for an application like
polyfit
/polyval
.xref: #3349 and #3631.
Output of
xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 14:38:56)
[Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 19.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: None
xarray: 0.14.1
pandas: 0.25.0
numpy: 1.17.0
scipy: 1.3.1
netCDF4: None
pydap: installed
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.25
cfgrib: 0.9.7.1
iris: None
bottleneck: 1.2.1
dask: 2.9.0+2.gd0daa5bc
distributed: 2.9.0
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: installed
setuptools: 42.0.2.post20191201
pip: 19.2.2
conda: None
pytest: 5.0.1
IPython: 7.10.1
sphinx: None
The text was updated successfully, but these errors were encountered: